{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,1]],"date-time":"2025-03-01T05:55:46Z","timestamp":1740808546569,"version":"3.38.0"},"reference-count":63,"publisher":"SAGE Publications","issue":"4","license":[{"start":{"date-parts":[[2022,9,8]],"date-time":"2022-09-08T00:00:00Z","timestamp":1662595200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"funder":[{"DOI":"10.13039\/501100008048","name":"nanjing university","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100008048","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["No. 72074108"],"award-info":[{"award-number":["No. 72074108"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Journal of Information Science"],"published-print":{"date-parts":[[2024,8]]},"abstract":"<jats:p> Intangible cultural heritage (ICH) songs convey folk lives and stories from different communities and nations through touching melodies and lyrics, which are rich in sentiments. Currently, researches about the sentiment analysis of songs are mainly based on lyrics, audios and lyric-audio. Recent studies have shown that deep spectrum features extracted from the spectrogram, generated from the audio, perform well in several speech-based tasks. However, studies combining spectrum features in multimodal sentiment analysis of songs are in a lack. Hence, we propose to combine the audio, lyric and spectrogram to conduct multimodal sentiment analysis for ICH songs, in a tri-modal fusion way. In addition, the correlations and interactions between different modalities are not considered fully. Here, we propose a multimodal song sentiment analysis model (MSSAM), including a strengthened audio features-guided attention (SAFGA) mechanism, which can learn intra- and inter-modal information effectively. First, we obtain strengthened audio features through the fusion of acoustic and spectrum features. Then, the strengthened audio features are used to guide the attention weights distribution of words in the lyric with help of SAFGA, which can make the model focus on the important words with sentiments and related with the sentiment of strengthened audio features, capturing modal interactions and complementary information. We take two world-level ICH lists, Jingju (\u4eac\u5267) and Kunqu (\u6606\u66f2), as examples, and build sentiment analysis datasets. We compare the proposed model with other state-of-the-arts baselines in Jingju and Kunqu datasets. Experimental results demonstrate the superiority of our proposed model. <\/jats:p>","DOI":"10.1177\/01655515221114454","type":"journal-article","created":{"date-parts":[[2022,9,9]],"date-time":"2022-09-09T06:53:31Z","timestamp":1662706411000},"page":"1063-1081","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":2,"title":["Multimodal sentiment analysis of intangible cultural heritage songs with strengthened audio features-guided attention"],"prefix":"10.1177","volume":"50","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6846-2901","authenticated-orcid":false,"given":"Tao","family":"Fan","sequence":"first","affiliation":[{"name":"School of Information Management, Nanjing University, China"}]},{"given":"Hao","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Information Management, Nanjing University, China"}]}],"member":"179","published-online":{"date-parts":[[2022,9,8]]},"reference":[{"doi-asserted-by":"publisher","key":"bibr1-01655515221114454","DOI":"10.1093\/ejil\/chr006"},{"doi-asserted-by":"publisher","key":"bibr2-01655515221114454","DOI":"10.21744\/lingcure.v5nS4.1713"},{"doi-asserted-by":"publisher","key":"bibr3-01655515221114454","DOI":"10.1016\/j.culher.2010.01.006"},{"volume-title":"Proceedings of the 2019 CHI conference on human factors in computing systems","author":"Lu Z","first-page":"1","key":"bibr4-01655515221114454"},{"doi-asserted-by":"publisher","key":"bibr5-01655515221114454","DOI":"10.1080\/14766825.2019.1693581"},{"doi-asserted-by":"publisher","key":"bibr6-01655515221114454","DOI":"10.1016\/j.jvlc.2018.06.005"},{"doi-asserted-by":"publisher","key":"bibr7-01655515221114454","DOI":"10.1109\/MIS.2018.111144858"},{"key":"bibr8-01655515221114454","first-page":"42","volume":"9","author":"Feng Y","year":"2021","journal-title":"Open J Soc Sci"},{"unstructured":"Luo Z, Xu H, Chen F. Audio sentiment analysis by heterogeneous signal features learned from utterance-based parallel neural network, http:\/\/ceur-ws.org\/Vol-2328\/3_2_paper_17.pdf","key":"bibr9-01655515221114454"},{"volume-title":"Proceedings of the 3rd international conference on computing and big data","author":"Quilingking Tomas J","first-page":"78","key":"bibr10-01655515221114454"},{"doi-asserted-by":"publisher","key":"bibr11-01655515221114454","DOI":"10.1007\/978-981-33-4299-6_14"},{"volume-title":"Proceedings of the 25th ACM international conference on multimedia","author":"Cummins N","first-page":"478","key":"bibr12-01655515221114454"},{"doi-asserted-by":"publisher","key":"bibr13-01655515221114454","DOI":"10.1016\/j.inffus.2021.06.003"},{"doi-asserted-by":"publisher","key":"bibr14-01655515221114454","DOI":"10.1016\/j.ipm.2019.02.018"},{"doi-asserted-by":"publisher","key":"bibr15-01655515221114454","DOI":"10.1016\/j.neucom.2021.02.020"},{"doi-asserted-by":"publisher","key":"bibr16-01655515221114454","DOI":"10.1016\/j.neucom.2020.10.021"},{"doi-asserted-by":"publisher","key":"bibr17-01655515221114454","DOI":"10.1016\/j.knosys.2019.01.019"},{"doi-asserted-by":"publisher","key":"bibr18-01655515221114454","DOI":"10.1162\/neco.1997.9.8.1735"},{"volume-title":"Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers)","author":"Devlin J","first-page":"4171","key":"bibr19-01655515221114454"},{"unstructured":"Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition, 2015, https:\/\/arxiv.org\/abs\/1409.1556v6?utm_source=substack&utm_medium=email","key":"bibr20-01655515221114454"},{"doi-asserted-by":"publisher","key":"bibr21-01655515221114454","DOI":"10.1109\/TASLP.2017.2678164"},{"volume-title":"Proceedings of the 12th language resources and evaluation conference","author":"Chen E","first-page":"6549","key":"bibr22-01655515221114454"},{"volume-title":"Proceedings of the 3rd workshop on sentiment analysis where AI meets psychology","author":"Patra BG","first-page":"24","key":"bibr23-01655515221114454"},{"doi-asserted-by":"publisher","key":"bibr24-01655515221114454","DOI":"10.1007\/978-3-319-61578-3_20"},{"volume-title":"Proceedings of the 2021 4th international conference on robotics, control and automation engineering (RCAE), Online","author":"Zhen C","first-page":"156","key":"bibr25-01655515221114454"},{"volume-title":"Proceedings of the 2020 international conference on artificial intelligence and signal processing (AISP)","author":"Aziz Md. A","first-page":"1","key":"bibr26-01655515221114454"},{"unstructured":"Hu X, Downie JS. When lyrics outperform audio for music mood classification: a feature analysis. In: Proceedings of the 11th international society for music information retrieval conference, ISMIR 2010, Utrecht, 9\u201313 August 2010, pp. 619\u2013624, https:\/\/archives.ismir.net\/ismir2010\/2010_ISMIR_Proceedings.pdf","key":"bibr27-01655515221114454"},{"volume-title":"An exploration of mood classification in the million songs dataset","year":"2015","author":"Corona H","key":"bibr28-01655515221114454"},{"volume-title":"Proceedings of the 11th workshop on computational approaches to subjectivity, sentiment and social media analysis","author":"Edmonds D","first-page":"221","key":"bibr29-01655515221114454"},{"volume-title":"Proceedings of the 2020 international conference on data mining workshops (ICDMW)","author":"Nath D","first-page":"39","key":"bibr30-01655515221114454"},{"volume-title":"Proceedings of the 12th international conference on natural language processing","author":"Patra BG","first-page":"261","key":"bibr31-01655515221114454"},{"volume-title":"Proceedings of the 2018 IEEE\/ACIS 17th international conference on computer and information science (ICIS)","author":"Wu X","first-page":"361","key":"bibr32-01655515221114454"},{"volume-title":"Proceedings of the 2018 first international conference on secure cyber computing and communication (ICSCCC)","author":"Ahuja M","first-page":"223","key":"bibr33-01655515221114454"},{"doi-asserted-by":"publisher","key":"bibr34-01655515221114454","DOI":"10.1007\/978-981-15-6318-8_34"},{"doi-asserted-by":"publisher","key":"bibr35-01655515221114454","DOI":"10.1007\/978-981-33-6881-1_19"},{"key":"bibr36-01655515221114454","first-page":"2331","volume-title":"Proceedings of the 2020 IEEE 4th information technology, networking, electronic and automation control conference (ITNEC)","volume":"1","author":"Liu G"},{"volume-title":"Proceedings of the 2021 IEEE 22nd international conference on information reuse and integration for data science (IRI)","author":"Sung BH","first-page":"437","key":"bibr37-01655515221114454"},{"unstructured":"Delbouys R, Hennequin R, Piccoli F et al. Music mood detection based on audio and lyrics with deep neural net, 2018, https:\/\/arxiv.org\/abs\/1809.07276#:~:text=Music%20Mood%20Detection%20Based%20On%20Audio%20And%20Lyrics%20With%20Deep%20Neural%20Net,R%C3%A9mi%20Delbouys%2C%20Romain&text=We%20consider%20the%20task%20of, model%20based%20on%20deep%20learning.","key":"bibr38-01655515221114454"},{"doi-asserted-by":"publisher","key":"bibr39-01655515221114454","DOI":"10.1007\/978-981-13-8707-4_3"},{"doi-asserted-by":"publisher","key":"bibr40-01655515221114454","DOI":"10.1016\/j.inffus.2017.02.003"},{"doi-asserted-by":"publisher","key":"bibr41-01655515221114454","DOI":"10.1016\/j.imavis.2017.08.003"},{"doi-asserted-by":"publisher","key":"bibr42-01655515221114454","DOI":"10.1016\/j.knosys.2022.108580"},{"doi-asserted-by":"publisher","key":"bibr43-01655515221114454","DOI":"10.1016\/j.neucom.2015.01.095"},{"doi-asserted-by":"publisher","key":"bibr44-01655515221114454","DOI":"10.1016\/j.inffus.2022.03.001"},{"key":"bibr45-01655515221114454","first-page":"10790","volume":"35","author":"Yu W","year":"2021","journal-title":"Proc AAAI Conf Artif Intell"},{"volume-title":"Proceedings of the 58th annual meeting of the association for computational linguistics","author":"Rahman W","first-page":"2359","key":"bibr46-01655515221114454"},{"volume-title":"Proceedings of the 2021 conference on empirical methods in natural language processing","author":"Han W","first-page":"9180","key":"bibr47-01655515221114454"},{"unstructured":"Mai S, Zeng Y, Zheng S et al. Hybrid contrastive learning of tri-modal representation for multimodal sentiment analysis. IEEE Trans Affect Comput 2022, https:\/\/arxiv.org\/pdf\/2109.01797.pdf","key":"bibr48-01655515221114454"},{"doi-asserted-by":"publisher","key":"bibr49-01655515221114454","DOI":"10.1016\/j.inffus.2020.08.006"},{"doi-asserted-by":"publisher","key":"bibr50-01655515221114454","DOI":"10.1007\/s10844-018-0497-4"},{"volume-title":"Proceedings of the 2017 7th international conference on affective computing and intelligent interaction workshops and demos (ACIIW)","author":"Amiriparian S","first-page":"26","key":"bibr51-01655515221114454"},{"volume-title":"Proceedings of the 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP)","author":"Cummins N","first-page":"4954","key":"bibr52-01655515221114454"},{"volume-title":"Proceedings of the 58th annual meeting of the association for computational linguistics","author":"Li X","first-page":"5849","key":"bibr53-01655515221114454"},{"volume-title":"Proceedings of the 58th annual meeting of the association for computational linguistics","author":"Yu J","first-page":"3342","key":"bibr54-01655515221114454"},{"doi-asserted-by":"publisher","key":"bibr55-01655515221114454","DOI":"10.1016\/j.neucom.2018.05.104"},{"doi-asserted-by":"publisher","key":"bibr56-01655515221114454","DOI":"10.1609\/aaai.v33i01.3301305"},{"doi-asserted-by":"publisher","key":"bibr57-01655515221114454","DOI":"10.1609\/aaai.v32i1.11962"},{"unstructured":"Panda RES, Malheiro R, Rocha B et al. Multi-modal music emotion recognition: a new dataset, methodology comparative analysis. In: Proceedings of the 10th international symposium on computer music multidisciplinary research (CMMR 2013), 2013, pp. 570\u2013582, https:\/\/www.researchgate.net\/publication\/257409136_Multi-Modal_Music_Emotion_Recognition_A_New_Dataset_Methodology_and_Comparative_Analysis","key":"bibr58-01655515221114454"},{"doi-asserted-by":"publisher","key":"bibr59-01655515221114454","DOI":"10.2139\/ssrn.3554826"},{"volume-title":"Proceedings of the 9th ACM international conference on web search and data mining","author":"You Q","first-page":"13","key":"bibr60-01655515221114454"},{"doi-asserted-by":"publisher","key":"bibr61-01655515221114454","DOI":"10.18653\/v1\/D17-1115"},{"doi-asserted-by":"publisher","key":"bibr62-01655515221114454","DOI":"10.1016\/j.imavis.2017.01.011"},{"doi-asserted-by":"publisher","key":"bibr63-01655515221114454","DOI":"10.1109\/MIS.2013.9"}],"container-title":["Journal of Information Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/01655515221114454","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/01655515221114454","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/01655515221114454","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,28]],"date-time":"2025-02-28T21:17:56Z","timestamp":1740777476000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/01655515221114454"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,8]]},"references-count":63,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,8]]}},"alternative-id":["10.1177\/01655515221114454"],"URL":"https:\/\/doi.org\/10.1177\/01655515221114454","relation":{},"ISSN":["0165-5515","1741-6485"],"issn-type":[{"type":"print","value":"0165-5515"},{"type":"electronic","value":"1741-6485"}],"subject":[],"published":{"date-parts":[[2022,9,8]]}}}