{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,25]],"date-time":"2026-04-25T23:06:37Z","timestamp":1777158397192,"version":"3.51.4"},"reference-count":25,"publisher":"Springer Science and Business Media LLC","issue":"5","license":[{"start":{"date-parts":[[2023,5,15]],"date-time":"2023-05-15T00:00:00Z","timestamp":1684108800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,5,15]],"date-time":"2023-05-15T00:00:00Z","timestamp":1684108800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Pers Ubiquit Comput"],"published-print":{"date-parts":[[2023,10]]},"abstract":"<jats:title>Abstract\n<\/jats:title><jats:p>Album art often reflects the trends and themes of the songs in a given collection, and even the identities of the musicians who produced it. It therefore plays a central role in fomenting a potential listener\u2019s first impression of the work. As such, musicians strive to find suitable images for this purpose, and those with limited financial resources or design skills may struggle to do so. Here, we report the development of Visualyre, a deep learning\u2013based application that generates album art images from users\u2019 song lyrics and audio files. This tool relies on generative adversarial network models to generate images from textual input (lyrics) and style transfer models to adjust the image according to the mood of the audio. We then report the results of a user study involving 35 amateur and independent musicians who tested the system. Results suggest that Visualyre was generally well received and largely effective in its intended purpose: providing musicians with a resource for generating their own album art.<\/jats:p>","DOI":"10.1007\/s00779-023-01724-1","type":"journal-article","created":{"date-parts":[[2023,5,15]],"date-time":"2023-05-15T15:10:48Z","timestamp":1684163448000},"page":"1861-1872","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Visualyre: multimodal album art generation for independent musicians"],"prefix":"10.1007","volume":"27","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2247-6766","authenticated-orcid":false,"given":"Gamar","family":"Azuaje","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kongmeng","family":"Liew","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Elena","family":"Epure","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shuntaro","family":"Yada","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shoko","family":"Wakamiya","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Eiji","family":"Aramaki","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2023,5,15]]},"reference":[{"key":"1724_CR1","unstructured":"James S (2014) The evolution of album art through decades of music. Dissertation, Radford University"},{"key":"1724_CR2","unstructured":"Belton RJ (2015) The narrative potential of album covers. Studies in Visual Arts and Communication: an International Journal 2(2):1\u20137"},{"key":"1724_CR3","doi-asserted-by":"publisher","unstructured":"M\u00fchlbach S, Arora P (2020) Behind the music: how labor changed for musicians through the subscription economy. First Monday, 25(4). https:\/\/doi.org\/10.5210\/fm.v25i4.10382","DOI":"10.5210\/fm.v25i4.10382"},{"key":"1724_CR4","unstructured":"Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing Systems"},{"key":"1724_CR5","unstructured":"Gatys LA, Ecker AS, Bethge M (2015) A neural algorithm of artistic style. CoRR abs\/1508.06576. https:\/\/arxiv.org\/abs\/1508.06576. Accessed 15 April 2021"},{"key":"1724_CR6","unstructured":"Laurier C, Herrera P (2008) Mood cloud: a real-time music mood visualization tool. In: Proceedings of the 10th International Society for Music Information Conference"},{"key":"1724_CR7","unstructured":"Husain A, Shiratuddin MF, Kok WW (2015) Establishing a framework for visualizing music mood using visual texture. In: 5th International Conference on Computing and Informatics"},{"key":"1724_CR8","unstructured":"Funasawa S, Ishizaki H, Hoashi K, Takishima Y, Katto J (2010) Automated music slideshow generation using web images based on lyrics. In: Proceedings of the 11th International Society for Music Information Retrieval Conference"},{"key":"1724_CR9","unstructured":"Radford A, Metz L, Chintala S (2016) Unsupervised representation learning with deep convolutional generative adversarial networks. In: 4th International Conference on Learning Representations. http:\/\/arxiv.org\/abs\/1511.06434. Accessed 15 April 2021"},{"key":"1724_CR10","unstructured":"Hepburn A, McConville R, Santos-Rodr\u0131guez R (2017) Album cover generation from genre tags. In: 10th International Workshop on Machine Learning and Music"},{"key":"1724_CR11","unstructured":"Odena A, Olah C, Shlens J (2017) Conditional image synthesis with auxiliary classifier gans. In: Proceedings of the 34th International Conference on Machine Learning 70:2642\u20132651. Available from https:\/\/proceedings.mlr.press\/v70\/odena17a.html"},{"issue":"8","key":"1724_CR12","doi-asserted-by":"publisher","first-page":"1947","DOI":"10.1109\/TPAMI.2018.2856256","volume":"41","author":"H Zhang","year":"2018","unstructured":"Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN (2018) Stackgan++: Realistic image synthesis with stacked generative adversarial networks. IEEE Trans Pattern Anal Mach Intell 41(8):1947\u20131962.\u00a0https:\/\/doi.org\/10.1109\/TPAMI.2018.2856256","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1724_CR13","doi-asserted-by":"publisher","unstructured":"Xu T, Zhang P, Huang Q, Zhang H, Gan Z, Huang X, He X (2018) Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In: 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 1316\u20131324. https:\/\/doi.org\/10.1109\/CVPR.2018.00143","DOI":"10.1109\/CVPR.2018.00143"},{"key":"1724_CR14","doi-asserted-by":"publisher","unstructured":"Hong S, Yang D, Choi J, Lee H (2018) Inferring semantic layout for hierarchical text-to-image synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7986\u20137994. https:\/\/doi.org\/10.1109\/CVPR.2018.00833","DOI":"10.1109\/CVPR.2018.00833"},{"key":"1724_CR15","doi-asserted-by":"publisher","unstructured":"Zhu M, Pan P, Chen W, Yang Y (2019) Dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis. In: IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5795\u20135803. https:\/\/doi.org\/10.1109\/CVPR.2019.00595","DOI":"10.1109\/CVPR.2019.00595"},{"key":"1724_CR16","doi-asserted-by":"publisher","unstructured":"Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Doll\u00b4ar P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European Conference on Computer Vision, pp 740\u2013755. https:\/\/doi.org\/10.1007\/978-3-319-10602-1_48","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"1724_CR17","doi-asserted-by":"publisher","unstructured":"Laurier C, Meyers O, Serra J, Blech M, Herrera P (2009) Music mood annotator design and integration. In: Seventh International Workshop on Content-Based Multimedia Indexing, pp 156\u2013161. https:\/\/doi.org\/10.1109\/CBMI.2009.45","DOI":"10.1109\/CBMI.2009.45"},{"key":"1724_CR18","doi-asserted-by":"publisher","unstructured":"Alonso-Jimenez P, Bogdanov D, Pons J, Serra X (2020) Tensorflow Audio Models in Essentia. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp 266\u2013270. https:\/\/doi.org\/10.1109\/ICASSP40776.2020.9054688","DOI":"10.1109\/ICASSP40776.2020.9054688"},{"key":"1724_CR19","doi-asserted-by":"publisher","unstructured":"Xie X, Tian F, Seah HS (2007) Feature Guided Texture Synthesis (FGTS) for artistic style transfer. In: Proceedings of the 2nd International Conference on Digital Interactive Media in Entertainment and Arts, pp 44\u201349. https:\/\/doi.org\/10.1145\/1306813.1306830","DOI":"10.1145\/1306813.1306830"},{"key":"1724_CR20","doi-asserted-by":"crossref","unstructured":"Li C, Wand M (2016) Precomputed real-time texture synthesis with Markovian generative adversarial networks. In: Computer Vision\u2013ECCV 2016: 14th European Conference, Proceedings, Part III 14, pp 702\u2013716","DOI":"10.1007\/978-3-319-46487-9_43"},{"key":"1724_CR21","doi-asserted-by":"crossref","unstructured":"Ghiasi G, Lee H, Kudlur M, Dumoulin V, Shlens J (2017) Exploring the structure of a real-time, arbitrary neural artistic stylization network. arXiv preprint http:\/\/arxiv.org\/abs\/1705.06830. Accessed 15 April 2021","DOI":"10.5244\/C.31.114"},{"issue":"4","key":"1724_CR22","doi-asserted-by":"publisher","first-page":"867","DOI":"10.1016\/j.jesp.2009.03.009","volume":"45","author":"DM Oppenheimer","year":"2009","unstructured":"Oppenheimer DM, Meyvis T, Davidenko N (2009) Instructional manipulation checks: detecting satisficing to increase statistical power. J Exp Soc Psychol 45(4):867\u2013872. https:\/\/doi.org\/10.1016\/j.jesp.2009.03.009","journal-title":"J Exp Soc Psychol"},{"issue":"2","key":"1724_CR23","doi-asserted-by":"publisher","first-page":"295","DOI":"10.1109\/TPAMI.2015.2439281","volume":"38","author":"C Dong","year":"2015","unstructured":"Dong C, Loy CC, He K, Tang X (2015) Image super-resolution using deep convolutional networks. IEEE Trans Pattern Anal Mach Intell 38(2):295\u2013307","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1724_CR24","doi-asserted-by":"crossref","unstructured":"Esser P, Rombach R, Ommer B (2020) Taming transformers for high-resolution image synthesis. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), vol 2021, pp 12873\u201312883","DOI":"10.1109\/CVPR46437.2021.01268"},{"key":"1724_CR25","unstructured":"Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J, Krueger G, Sutskever I (2021) Learning transferable visual models from natural language supervision. In: Proceedings of the 38th International Conference on Machine Learning. arXiv. https:\/\/arxiv.org\/abs\/2103.00020"}],"container-title":["Personal and Ubiquitous Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00779-023-01724-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00779-023-01724-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00779-023-01724-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,20]],"date-time":"2023-10-20T05:07:04Z","timestamp":1697778424000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00779-023-01724-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,15]]},"references-count":25,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2023,10]]}},"alternative-id":["1724"],"URL":"https:\/\/doi.org\/10.1007\/s00779-023-01724-1","relation":{},"ISSN":["1617-4909","1617-4917"],"issn-type":[{"value":"1617-4909","type":"print"},{"value":"1617-4917","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,5,15]]},"assertion":[{"value":"31 March 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 March 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 May 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent to participate"}},{"value":"Not applicable.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare no competing \ninterests.","order":5,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}