{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,15]],"date-time":"2025-08-15T02:08:23Z","timestamp":1755223703665,"version":"3.43.0"},"reference-count":41,"publisher":"World Scientific Pub Co Pte Ltd","issue":"12","funder":[{"name":"grants from the University synergy innovanon Program of Anhui provinc","award":["GXXT-2022-042"],"award-info":[{"award-number":["GXXT-2022-042"]}]},{"name":"grants from the University synergy innovanon Program of Anhui provinc","award":["GXXT-2022-101"],"award-info":[{"award-number":["GXXT-2022-101"]}]},{"name":"Open Research Fund of Key Laboratory of philosophyand Social science of Anhui Provinc","award":["SYS2023A06"],"award-info":[{"award-number":["SYS2023A06"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["sU20A20229"],"award-info":[{"award-number":["sU20A20229"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"he Key Projects of the National Natural Science Foundation of Universities in Anhui Provinc","award":["2023AH051302"],"award-info":[{"award-number":["2023AH051302"]}]},{"name":"The Scientific Research Foundation of Hefei Normal Universit","award":["2022rcjj57"],"award-info":[{"award-number":["2022rcjj57"]}]},{"name":"The Open Project of Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, Anhui Universit","award":["MMC202405"],"award-info":[{"award-number":["MMC202405"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Int. J. Patt. Recogn. Artif. Intell."],"published-print":{"date-parts":[[2025,9,30]]},"abstract":"<jats:p> The understanding and representation of songs is a crucial issue in music platforms, as it can facilitate numerous applications in the music field. Songs are a common multimodal art form within the music domain, achieving rich musical connotations and strong expressiveness. However, the data of songs exhibit obvious multimodal and heterogeneous characteristics, presenting significant challenges to the understanding and representation of songs. Regrettably, the current methods do not respond to these challenges effectively. To this end, in this study, a unified multimodal representation framework for songs, namely SUMMR, is proposed. Specifically, first, a two-layer framework is put forward. In embedding layer, the features of different modalities data are embedded into a unified space. In content layer, a novel cross-modal attention mechanism is designed, which effectively capture the cross-modal semantic associations and deep music features, thereby obtaining a unified representation of songs. Then, a two-level hierarchical pre-training algorithm is proposed, which can effectively lower the training cost. Finally, experiments are conducted on two typical music tasks with public datasets of songs, where the experimental results demonstrate the effectiveness of SUMMR for understanding and representation of songs, and also show that SUMMR has good capability of being fine-tuned in many song-based tasks. <\/jats:p>","DOI":"10.1142\/s0218001425580017","type":"journal-article","created":{"date-parts":[[2025,3,11]],"date-time":"2025-03-11T05:14:35Z","timestamp":1741670075000},"source":"Crossref","is-referenced-by-count":0,"title":["SUMMR: A Unified Multimodal Representation Framework for Songs"],"prefix":"10.1142","volume":"39","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8926-0914","authenticated-orcid":false,"given":"Lei","family":"Ye","sequence":"first","affiliation":[{"name":"School of Music, Hefei Normal University, Hefei, P.\u00a0R.\u00a0China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-7256-4848","authenticated-orcid":false,"given":"Bing","family":"Shen","sequence":"additional","affiliation":[{"name":"School of Arts & Communication, Beijing Normal University Beijing, P.\u00a0R.\u00a0China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7950-4919","authenticated-orcid":false,"given":"Yu","family":"Su","sequence":"additional","affiliation":[{"name":"School of Computer and Artificial Intelligence, Hefei Normal University Hefei, P.\u00a0R.\u00a0China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-4369-4451","authenticated-orcid":false,"given":"Xiao","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Music, Hefei Normal University, Hefei, P.\u00a0R.\u00a0China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-3960-7154","authenticated-orcid":false,"given":"Yi","family":"Gong","sequence":"additional","affiliation":[{"name":"Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, P.\u00a0R.\u00a0China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-5487-2647","authenticated-orcid":false,"given":"Yifei","family":"Zhou","sequence":"additional","affiliation":[{"name":"School of Music, Hefei Normal University, Hefei, P.\u00a0R.\u00a0China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-0574-9519","authenticated-orcid":false,"given":"JunYu","family":"Lu","sequence":"additional","affiliation":[{"name":"School of Big Data, University of Science and Technology of China, Hefei, P.\u00a0R.\u00a0China"}]}],"member":"219","published-online":{"date-parts":[[2025,7,23]]},"reference":[{"key":"S0218001425580017BIB001","doi-asserted-by":"publisher","DOI":"10.1109\/ICRTEC56977.2023.10111926"},{"key":"S0218001425580017BIB004","first-page":"101091","volume":"34","author":"Altan G.","year":"2022","journal-title":"Eng. Sci. Technol. Int. J."},{"key":"S0218001425580017BIB005","doi-asserted-by":"publisher","DOI":"10.1109\/JBHI.2019.2931395"},{"volume-title":"Proc. Int. Conf. IoT Based Control Networks & Intelligent Systems (ICICNIS)","year":"2021","author":"Athulya K. M.","key":"S0218001425580017BIB006"},{"key":"S0218001425580017BIB008","first-page":"591","volume-title":"Proc. 12th Int. Conf. Music Information Retrieval Conference (ISMIR 2011)","author":"Bertin-Mahieux T.","year":"2011"},{"key":"S0218001425580017BIB009","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2973795"},{"key":"S0218001425580017BIB011","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-13287-2_3"},{"key":"S0218001425580017BIB012","doi-asserted-by":"publisher","DOI":"10.2307\/2531595"},{"key":"S0218001425580017BIB014","doi-asserted-by":"publisher","DOI":"10.1002\/aris.1440370108"},{"key":"S0218001425580017BIB015","doi-asserted-by":"publisher","DOI":"10.1109\/ASYU.2018.8554016"},{"key":"S0218001425580017BIB016","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP49357.2023.10095889"},{"key":"S0218001425580017BIB017","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2010.2098858"},{"key":"S0218001425580017BIB018","doi-asserted-by":"publisher","DOI":"10.1162\/neco_a_01273"},{"key":"S0218001425580017BIB019","first-page":"15037","volume-title":"Int. Conf. Machine Learning (ICML)","author":"Gardner J. P.","year":"2024"},{"key":"S0218001425580017BIB020","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3219885"},{"key":"S0218001425580017BIB021","doi-asserted-by":"publisher","DOI":"10.1080\/09298215.2021.1977336"},{"issue":"1","key":"S0218001425580017BIB022","first-page":"9668018","volume":"2022","author":"He Q.","year":"2022","journal-title":"Math. Probl. Eng."},{"key":"S0218001425580017BIB023","doi-asserted-by":"publisher","DOI":"10.1111\/j.1468-2257.2012.00593.x"},{"key":"S0218001425580017BIB024","first-page":"937","volume-title":"Proc. 11th Int. Society for Music Information Retrieval Conf. (ISMIR, 2010)","volume":"86","author":"Kim Y. E.","year":"2010"},{"key":"S0218001425580017BIB025","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-662-49722-7"},{"key":"S0218001425580017BIB026","first-page":"167","author":"Kraaij W.","year":"1994","journal-title":"Informatiewetenschap"},{"key":"S0218001425580017BIB027","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-022-14252-6"},{"key":"S0218001425580017BIB028","first-page":"11","volume-title":"Proc. 1st Int. Symposium Music Information Retrieval","volume":"270","author":"Logan B.","year":"2000"},{"key":"S0218001425580017BIB029","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-21945-5"},{"key":"S0218001425580017BIB030","volume-title":"Multimodal Music Processing","volume":"3","author":"M\u00fcller M.","year":"2012"},{"first-page":"570","volume-title":"10th Int. Symp. Computer Music Multidisciplinary Research (CMMR 2013)","author":"Panda R. E. S.","key":"S0218001425580017BIB032"},{"key":"S0218001425580017BIB033","doi-asserted-by":"publisher","DOI":"10.1007\/s10867-010-9195-3"},{"issue":"2","key":"S0218001425580017BIB034","volume":"12","author":"Ponlatha S.","year":"2021","journal-title":"Int. J. Adv. Res. Sci. Commun. Technol."},{"key":"S0218001425580017BIB035","first-page":"1","volume-title":"2022 6th Int. Conf. Computation System and Information Technology for Sustainable Solutions (CSITSS)","author":"Prince S.","year":"2022"},{"key":"S0218001425580017BIB036","doi-asserted-by":"publisher","DOI":"10.1016\/j.conb.2019.06.005"},{"key":"S0218001425580017BIB037","first-page":"2980","volume-title":"Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR 2017)","author":"Ross T.-Y.","year":"2017"},{"key":"S0218001425580017BIB038","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-13-8950-4_2"},{"key":"S0218001425580017BIB039","doi-asserted-by":"publisher","DOI":"10.1561\/1500000042"},{"key":"S0218001425580017BIB040","first-page":"241","volume-title":"16th Int. Society for Music Information Retrieval Conf.","author":"Schreiber H.","year":"2015"},{"key":"S0218001425580017BIB041","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-13-7403-6_11"},{"key":"S0218001425580017BIB042","doi-asserted-by":"publisher","DOI":"10.1109\/MMRP.2019.00012"},{"key":"S0218001425580017BIB045","doi-asserted-by":"publisher","DOI":"10.1109\/TSA.2002.800560"},{"volume-title":"Advances in Neural Information Processing Systems (NIPS 2017)","year":"2017","author":"Vaswani A.","key":"S0218001425580017BIB046"},{"key":"S0218001425580017BIB047","doi-asserted-by":"publisher","DOI":"10.1109\/GHCI50508.2021.9514020"},{"key":"S0218001425580017BIB048","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-14442-9_3"},{"key":"S0218001425580017BIB049","doi-asserted-by":"publisher","DOI":"10.1007\/s13042-010-0001-0"}],"container-title":["International Journal of Pattern Recognition and Artificial Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S0218001425580017","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,12]],"date-time":"2025-08-12T06:25:01Z","timestamp":1754979901000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/10.1142\/S0218001425580017"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,23]]},"references-count":41,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2025,9,30]]}},"alternative-id":["10.1142\/S0218001425580017"],"URL":"https:\/\/doi.org\/10.1142\/s0218001425580017","relation":{},"ISSN":["0218-0014","1793-6381"],"issn-type":[{"type":"print","value":"0218-0014"},{"type":"electronic","value":"1793-6381"}],"subject":[],"published":{"date-parts":[[2025,7,23]]},"article-number":"2558001"}}