{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,24]],"date-time":"2025-09-24T00:15:13Z","timestamp":1758672913934,"version":"3.44.0"},"publisher-location":"California","reference-count":0,"publisher":"International Joint Conferences on Artificial Intelligence Organization","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,9]]},"abstract":"<jats:p>Text-to-music (TTM) generation, which converts textual descriptions into audio, opens up innovative avenues for multimedia creation.\n\nAchieving high quality and diversity in this process demands extensive, high-quality data, which are often scarce in available datasets. Most open-source datasets frequently suffer from issues like low-quality waveforms and low text-audio consistency, hindering the advancement of music generation models.\n\nTo address these challenges, we propose a novel quality-aware training paradigm for generating high-quality, high-musicality music from large-scale, quality-imbalanced datasets. Additionally, by leveraging unique properties in the latent space of musical signals, we adapt and implement a masked diffusion transformer (MDT) model for the TTM task, showcasing its capacity for quality control and enhanced musicality. Furthermore, we introduce a three-stage caption refinement approach to address low-quality captions' issue. Experiments show state-of-the-art (SOTA) performance on benchmark datasets including MusicCaps and the Song-Describer Dataset with both objective and subjective metrics.\n\nDemo audio samples are available at https:\/\/qa-mdt.github.io\/, code and pretrained checkpoints are open-sourced at https:\/\/github.com\/ivcylc\/OpenMusic.<\/jats:p>","DOI":"10.24963\/ijcai.2025\/1126","type":"proceedings-article","created":{"date-parts":[[2025,9,19]],"date-time":"2025-09-19T08:10:40Z","timestamp":1758269440000},"page":"10135-10143","source":"Crossref","is-referenced-by-count":0,"title":["QA-MDT: Quality-aware Masked Diffusion Transformer for Enhanced Music Generation"],"prefix":"10.24963","author":[{"given":"Chang","family":"Li","sequence":"first","affiliation":[{"name":"USTC"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ruoyu","family":"Wang","sequence":"additional","affiliation":[{"name":"USTC"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lijuan","family":"Liu","sequence":"additional","affiliation":[{"name":"USTC"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jun","family":"Du","sequence":"additional","affiliation":[{"name":"USTC"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yixuan","family":"Sun","sequence":"additional","affiliation":[{"name":"USTC"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zilu","family":"Guo","sequence":"additional","affiliation":[{"name":"USTC"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhengrong","family":"Zhang","sequence":"additional","affiliation":[{"name":"USTC"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuan","family":"Jiang","sequence":"additional","affiliation":[{"name":"USTC"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jianqing","family":"Gao","sequence":"additional","affiliation":[{"name":"iFlytek AI Research"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Feng","family":"Ma","sequence":"additional","affiliation":[{"name":"iFlytek AI Research"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"10584","event":{"number":"34","sponsor":["International Joint Conferences on Artificial Intelligence Organization (IJCAI)"],"acronym":"IJCAI-2025","name":"Thirty-Fourth International Joint Conference on Artificial Intelligence {IJCAI-25}","start":{"date-parts":[[2025,8,16]]},"theme":"Artificial Intelligence","location":"Montreal, Canada","end":{"date-parts":[[2025,8,22]]}},"container-title":["Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence"],"original-title":[],"deposited":{"date-parts":[[2025,9,23]],"date-time":"2025-09-23T11:36:13Z","timestamp":1758627373000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.ijcai.org\/proceedings\/2025\/1126"}},"subtitle":[],"proceedings-subject":"Artificial Intelligence Research Articles","short-title":[],"issued":{"date-parts":[[2025,9]]},"references-count":0,"URL":"https:\/\/doi.org\/10.24963\/ijcai.2025\/1126","relation":{},"subject":[],"published":{"date-parts":[[2025,9]]}}}