{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:10:15Z","timestamp":1750219815086,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":29,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,10,29]],"date-time":"2023-10-29T00:00:00Z","timestamp":1698537600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Engineering and Physical Sciences Research Council grant","award":["EP\/V025708\/1"],"award-info":[{"award-number":["EP\/V025708\/1"]}]},{"name":"021 Alexa Prize TaskBot Grant"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,11,2]]},"DOI":"10.1145\/3607827.3616842","type":"proceedings-article","created":{"date-parts":[[2023,10,26]],"date-time":"2023-10-26T22:09:13Z","timestamp":1698358153000},"page":"51-59","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Generating Multimodal Augmentations with LLMs from Song Metadata for Music Information Retrieval"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3806-9575","authenticated-orcid":false,"given":"Federico","family":"Rossetto","sequence":"first","affiliation":[{"name":"University of Glasgow, Glasgow, United Kingdom"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2422-8651","authenticated-orcid":false,"given":"Jeffrey","family":"Dalton","sequence":"additional","affiliation":[{"name":"University of Glasgow, Glasgow, United Kingdom"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4228-7962","authenticated-orcid":false,"given":"Roderick","family":"Murray-Smith","sequence":"additional","affiliation":[{"name":"University of Glasgow, Glasgow, United Kingdom"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,10,29]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.7316790"},{"key":"e_1_3_2_1_2_1","volume-title":"Pierre Vandergheynst, and Xavier Bresson.","author":"Benzi Kirell","year":"2016","unstructured":"Kirell Benzi , Micha\u00eb l Defferrard , Pierre Vandergheynst, and Xavier Bresson. 2016 . FMA : A Dataset For Music Analysis. CoRR , Vol. abs\/ 1612 .01840 (2016). showeprint[arXiv]1612.01840 http:\/\/arxiv.org\/abs\/1612.01840 Kirell Benzi, Micha\u00eb l Defferrard, Pierre Vandergheynst, and Xavier Bresson. 2016. FMA: A Dataset For Music Analysis. CoRR , Vol. abs\/1612.01840 (2016). showeprint[arXiv]1612.01840 http:\/\/arxiv.org\/abs\/1612.01840"},{"key":"e_1_3_2_1_3_1","volume-title":"Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR","author":"Bertin-Mahieux Thierry","year":"2011","unstructured":"Thierry Bertin-Mahieux , Daniel Ellis , Brian Whitman , and Paul Lamere . 2011 . The Million Song Dataset . Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR 2011), 591--596. Thierry Bertin-Mahieux, Daniel Ellis, Brian Whitman, and Paul Lamere. 2011. The Million Song Dataset. Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR 2011), 591--596."},{"key":"e_1_3_2_1_4_1","unstructured":"Dmitry Bogdanov Alastair Porter Philip Tovstogan and Minz Won. 2019a. MediaEval 2019: Emotion and Theme Recognition in Music Using Jamendo. In MediaEval Benchmarking Initiative for Multimedia Evaluation.  Dmitry Bogdanov Alastair Porter Philip Tovstogan and Minz Won. 2019a. MediaEval 2019: Emotion and Theme Recognition in Music Using Jamendo. In MediaEval Benchmarking Initiative for Multimedia Evaluation."},{"key":"e_1_3_2_1_5_1","volume-title":"The MTG-Jamendo Dataset for Automatic Music Tagging. In International Conference on Machine Learning.","author":"Bogdanov Dmitry","year":"2019","unstructured":"Dmitry Bogdanov , Minz Won , Philip Tovstogan , Alastair Porter , and Xavier Serra . 2019 b. The MTG-Jamendo Dataset for Automatic Music Tagging. In International Conference on Machine Learning. Dmitry Bogdanov, Minz Won, Philip Tovstogan, Alastair Porter, and Xavier Serra. 2019b. The MTG-Jamendo Dataset for Automatic Music Tagging. In International Conference on Machine Learning."},{"key":"e_1_3_2_1_6_1","volume-title":"Codified audio language modeling learns useful representations for music information retrieval. CoRR","author":"Castellon Rodrigo","year":"2021","unstructured":"Rodrigo Castellon , Chris Donahue , and Percy Liang . 2021. Codified audio language modeling learns useful representations for music information retrieval. CoRR , Vol. abs\/ 2107 .05677 ( 2021 ). showeprint[arXiv]2107.05677 https:\/\/arxiv.org\/abs\/2107.05677 Rodrigo Castellon, Chris Donahue, and Percy Liang. 2021. Codified audio language modeling learns useful representations for music information retrieval. CoRR , Vol. abs\/2107.05677 (2021). showeprint[arXiv]2107.05677 https:\/\/arxiv.org\/abs\/2107.05677"},{"key":"e_1_3_2_1_7_1","volume-title":"Music Mood Detection Based On Audio And Lyrics With Deep Neural Net. CoRR","author":"Delbouys R\u00e9","year":"2018","unstructured":"R\u00e9 mi Delbouys , Romain Hennequin , Francesco Piccoli , Jimena Royo-Letelier , and Manuel Moussallam . 2018. Music Mood Detection Based On Audio And Lyrics With Deep Neural Net. CoRR , Vol. abs\/ 1809 .07276 ( 2018 ). showeprint[arXiv]1809.07276 http:\/\/arxiv.org\/abs\/1809.07276 R\u00e9 mi Delbouys, Romain Hennequin, Francesco Piccoli, Jimena Royo-Letelier, and Manuel Moussallam. 2018. Music Mood Detection Based On Audio And Lyrics With Deep Neural Net. CoRR , Vol. abs\/1809.07276 (2018). showeprint[arXiv]1809.07276 http:\/\/arxiv.org\/abs\/1809.07276"},{"key":"e_1_3_2_1_8_1","volume-title":"CNN Architectures for Large-Scale Audio Classification. In International Conference on Acoustics, Speech and Signal Processing (ICASSP). https:\/\/arxiv.org\/abs\/1609","author":"Hershey Shawn","year":"2017","unstructured":"Shawn Hershey , Sourish Chaudhuri , Daniel P. W. Ellis , Jort F. Gemmeke , Aren Jansen , Channing Moore , Manoj Plakal , Devin Platt , Rif A. Saurous , Bryan Seybold , Malcolm Slaney , Ron Weiss , and Kevin Wilson . 2017 . CNN Architectures for Large-Scale Audio Classification. In International Conference on Acoustics, Speech and Signal Processing (ICASSP). https:\/\/arxiv.org\/abs\/1609 .09430 Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm Slaney, Ron Weiss, and Kevin Wilson. 2017. CNN Architectures for Large-Scale Audio Classification. In International Conference on Acoustics, Speech and Signal Processing (ICASSP). https:\/\/arxiv.org\/abs\/1609.09430"},{"key":"e_1_3_2_1_9_1","volume-title":"UnifiedQA: Crossing Format Boundaries With a Single QA System. CoRR","author":"Khashabi Daniel","year":"2020","unstructured":"Daniel Khashabi , Tushar Khot , Ashish Sabharwal , Oyvind Tafjord , Peter Clark , and Hannaneh Hajishirzi . 2020. UnifiedQA: Crossing Format Boundaries With a Single QA System. CoRR , Vol. abs\/ 2005 .00700 ( 2020 ). showeprint[arXiv]2005.00700 https:\/\/arxiv.org\/abs\/2005.00700 Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Oyvind Tafjord, Peter Clark, and Hannaneh Hajishirzi. 2020. UnifiedQA: Crossing Format Boundaries With a Single QA System. CoRR , Vol. abs\/2005.00700 (2020). showeprint[arXiv]2005.00700 https:\/\/arxiv.org\/abs\/2005.00700"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.1417647"},{"key":"e_1_3_2_1_11_1","volume-title":"Multi-Level and Multi-Scale Feature Aggregation Using Pre-trained Convolutional Neural Networks for Music Auto-tagging. CoRR","author":"Lee Jongpil","year":"2017","unstructured":"Jongpil Lee and Juhan Nam . 2017a. Multi-Level and Multi-Scale Feature Aggregation Using Pre-trained Convolutional Neural Networks for Music Auto-tagging. CoRR , Vol. abs\/ 1703 .01793 ( 2017 ). showeprint[arXiv]1703.01793 http:\/\/arxiv.org\/abs\/1703.01793 Jongpil Lee and Juhan Nam. 2017a. Multi-Level and Multi-Scale Feature Aggregation Using Pre-trained Convolutional Neural Networks for Music Auto-tagging. CoRR , Vol. abs\/1703.01793 (2017). showeprint[arXiv]1703.01793 http:\/\/arxiv.org\/abs\/1703.01793"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/LSP.2017.2713830"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP43922.2022.9746996"},{"key":"e_1_3_2_1_14_1","volume-title":"Ehmann","author":"McCallum Matthew C.","year":"2022","unstructured":"Matthew C. McCallum , Filip Korzeniowski , Sergio Oramas , Fabien Gouyon , and Andreas F . Ehmann . 2022 . Supervised and Unsupervised Learning of Audio Representations for Music Understanding . arxiv: 2210.03799 [cs.SD] Matthew C. McCallum, Filip Korzeniowski, Sergio Oramas, Fabien Gouyon, and Andreas F. Ehmann. 2022. Supervised and Unsupervised Learning of Audio Representations for Music Understanding. arxiv: 2210.03799 [cs.SD]"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5334\/tismir.10"},{"key":"e_1_3_2_1_17_1","volume-title":"Multi-label Music Genre Classification from Audio, Text, and Images Using Deep Features. CoRR","author":"Oramas Sergio","year":"2017","unstructured":"Sergio Oramas , Oriol Nieto , Francesco Barbieri , and Xavier Serra . 2017a. Multi-label Music Genre Classification from Audio, Text, and Images Using Deep Features. CoRR , Vol. abs\/ 1707 .04916 ( 2017 ). showeprint[arXiv]1707.04916 http:\/\/arxiv.org\/abs\/1707.04916 Sergio Oramas, Oriol Nieto, Francesco Barbieri, and Xavier Serra. 2017a. Multi-label Music Genre Classification from Audio, Text, and Images Using Deep Features. CoRR , Vol. abs\/1707.04916 (2017). showeprint[arXiv]1707.04916 http:\/\/arxiv.org\/abs\/1707.04916"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3125486.3125492"},{"key":"e_1_3_2_1_19_1","unstructured":"Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright Pamela Mishkin Chong Zhang Sandhini Agarwal Katarina Slama Alex Ray John Schulman Jacob Hilton Fraser Kelton Luke Miller Maddie Simens Amanda Askell Peter Welinder Paul Christiano Jan Leike and Ryan Lowe. 2022. Training language models to follow instructions with human feedback. arxiv: 2203.02155 [cs.CL]  Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright Pamela Mishkin Chong Zhang Sandhini Agarwal Katarina Slama Alex Ray John Schulman Jacob Hilton Fraser Kelton Luke Miller Maddie Simens Amanda Askell Peter Welinder Paul Christiano Jan Leike and Ryan Lowe. 2022. Training language models to follow instructions with human feedback. arxiv: 2203.02155 [cs.CL]"},{"key":"e_1_3_2_1_20_1","volume-title":"James Thorne, Yacine Jernite, Vassilis Plachouras, Tim Rockt\"a schel, and Sebastian Riedel.","author":"Petroni Fabio","year":"2020","unstructured":"Fabio Petroni , Aleksandra Piktus , Angela Fan , Patrick S. H. Lewis , Majid Yazdani , Nicola De Cao , James Thorne, Yacine Jernite, Vassilis Plachouras, Tim Rockt\"a schel, and Sebastian Riedel. 2020 . KILT: a Benchmark for Knowledge Intensive Language Tasks. CoRR , Vol. abs\/ 2009 .02252 (2020). showeprint[arXiv]2009.02252 https:\/\/arxiv.org\/abs\/2009.02252 Fabio Petroni, Aleksandra Piktus, Angela Fan, Patrick S. H. Lewis, Majid Yazdani, Nicola De Cao, James Thorne, Yacine Jernite, Vassilis Plachouras, Tim Rockt\"a schel, and Sebastian Riedel. 2020. KILT: a Benchmark for Knowledge Intensive Language Tasks. CoRR , Vol. abs\/2009.02252 (2020). showeprint[arXiv]2009.02252 https:\/\/arxiv.org\/abs\/2009.02252"},{"key":"e_1_3_2_1_21_1","volume-title":"International Society for Music Information Retrieval Conference.","author":"Pons Jordi","year":"2017","unstructured":"Jordi Pons , Oriol Nieto , Matthew Prockup , Erik M. Schmidt , Andreas F. Ehmann , and Xavier Serra . 2017 . End-to-end Learning for Music Audio Tagging at Scale . In International Society for Music Information Retrieval Conference. Jordi Pons, Oriol Nieto, Matthew Prockup, Erik M. Schmidt, Andreas F. Ehmann, and Xavier Serra. 2017. End-to-end Learning for Music Audio Tagging at Scale. In International Society for Music Information Retrieval Conference."},{"key":"e_1_3_2_1_22_1","volume-title":"Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever.","author":"Radford Alec","year":"2022","unstructured":"Alec Radford , Jong Wook Kim , Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. 2022 . Robust Speech Recognition via Large-Scale Weak Supervision . arxiv: 2212.04356 [eess.AS] Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. 2022. Robust Speech Recognition via Large-Scale Weak Supervision. arxiv: 2212.04356 [eess.AS]"},{"key":"e_1_3_2_1_23_1","volume-title":"Liu","author":"Raffel Colin","year":"2019","unstructured":"Colin Raffel , Noam Shazeer , Adam Roberts , Katherine Lee , Sharan Narang , Michael Matena , Yanqi Zhou , Wei Li , and Peter J . Liu . 2019 . Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. CoRR , Vol. abs\/ 1910 .10683 (2019). showeprint[arXiv]1910.10683 http:\/\/arxiv.org\/abs\/1910.10683 Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2019. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. CoRR , Vol. abs\/1910.10683 (2019). showeprint[arXiv]1910.10683 http:\/\/arxiv.org\/abs\/1910.10683"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1037\/h0077714"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2506364.2506365"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/2390848.2390851"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSA.2002.800560"},{"key":"e_1_3_2_1_28_1","volume-title":"CoRR","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N. Gomez , Lukasz Kaiser , and Illia Polosukhin . 2017. Attention Is All You Need. CoRR , Vol. abs\/ 1706 .03762 ( 2017 ). showeprint[arXiv]1706.03762 http:\/\/arxiv.org\/abs\/1706.03762 Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. CoRR , Vol. abs\/1706.03762 (2017). showeprint[arXiv]1706.03762 http:\/\/arxiv.org\/abs\/1706.03762"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"crossref","unstructured":"Hang Zhao Chen Zhang Belei Zhu Zejun Ma and Kejun Zhang. 2022b. S3T: Self-Supervised Pre-training with Swin Transformer for Music Classification. arxiv: 2202.10139 [eess.AS]  Hang Zhao Chen Zhang Belei Zhu Zejun Ma and Kejun Zhang. 2022b. S3T: Self-Supervised Pre-training with Swin Transformer for Music Classification. arxiv: 2202.10139 [eess.AS]","DOI":"10.1109\/ICASSP43922.2022.9746056"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICME52920.2022.9859812"}],"event":{"name":"MM '23: The 31st ACM International Conference on Multimedia","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Ottawa ON Canada","acronym":"MM '23"},"container-title":["Proceedings of the 1st Workshop on Large Generative Models Meet Multimodal Applications"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3607827.3616842","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3607827.3616842","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:46:05Z","timestamp":1750178765000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3607827.3616842"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,10,29]]},"references-count":29,"alternative-id":["10.1145\/3607827.3616842","10.1145\/3607827"],"URL":"https:\/\/doi.org\/10.1145\/3607827.3616842","relation":{},"subject":[],"published":{"date-parts":[[2023,10,29]]},"assertion":[{"value":"2023-10-29","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}