{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,3]],"date-time":"2025-12-03T17:59:10Z","timestamp":1764784750881,"version":"build-2065373602"},"reference-count":48,"publisher":"MDPI AG","issue":"21","license":[{"start":{"date-parts":[[2021,10,26]],"date-time":"2021-10-26T00:00:00Z","timestamp":1635206400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100003725","name":"National Research Foundation of Korea","doi-asserted-by":"publisher","award":["NRF-2020R1A2B5B02002770"],"award-info":[{"award-number":["NRF-2020R1A2B5B02002770"]}],"id":[{"id":"10.13039\/501100003725","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Watching videos online has become part of a relaxed lifestyle. The music in videos has a sensitive influence on human emotions, perception, and imaginations, which can make people feel relaxed or sad, and so on. Therefore, it is particularly important for people who make advertising videos to understand the relationship between the physical elements of music and empathy characteristics. The purpose of this paper is to analyze the music features in an advertising video and extract the music features that make people empathize. This paper combines both methods of the power spectrum of MFCC and image RGB analysis to find the audio feature vector. In spectral analysis, the eigenvectors obtained in the analysis process range from blue (low range) to green (medium range) to red (high range). The machine learning random forest classifier is used to classify the data obtained by machine learning, and the trained model is used to monitor the development of an advertisement empathy system in real time. The result is that the optimal model is obtained with the training accuracy result of 99.173% and a test accuracy of 86.171%, which can be deemed as correct by comparing the three models of audio feature value analysis. The contribution of this study can be summarized as follows: (1) the low-frequency and high-amplitude audio in the video is more likely to resonate than the high-frequency and high-amplitude audio; (2) it is found that frequency and audio amplitude are important attributes for describing waveforms by observing the characteristics of the machine learning classifier; (3) a new audio extraction method is proposed to induce human empathy. That is, the feature value extracted by the method of spectrogram image features of audio has the most ability to arouse human empathy.<\/jats:p>","DOI":"10.3390\/s21217111","type":"journal-article","created":{"date-parts":[[2021,10,26]],"date-time":"2021-10-26T23:54:33Z","timestamp":1635292473000},"page":"7111","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["An Empathy Evaluation System Using Spectrogram Image Features of Audio"],"prefix":"10.3390","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5230-6285","authenticated-orcid":false,"given":"Jing","family":"Zhang","sequence":"first","affiliation":[{"name":"Department of Emotion Engineering, University of Sangmyung, Seoul 03016, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xingyu","family":"Wen","sequence":"additional","affiliation":[{"name":"Department of Emotion Engineering, University of Sangmyung, Seoul 03016, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0035-3853","authenticated-orcid":false,"given":"Ayoung","family":"Cho","sequence":"additional","affiliation":[{"name":"Department of Emotion Engineering, University of Sangmyung, Seoul 03016, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mincheol","family":"Whang","sequence":"additional","affiliation":[{"name":"Department of Human Centered Artificial Intelligence, University of Sangmyung, Seoul 03016, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2021,10,26]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"988","DOI":"10.1037\/0012-1649.32.6.988","article-title":"Empathy in conduct-disordered and comparison youth","volume":"32","author":"Cohen","year":"1996","journal-title":"Dev. Psychol."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Dzedzickis, A., Kaklauskas, A., and Bucinskas, V. (2020). Human Emotion Recognition: Review of Sensors and Methods. Sensors, 20.","DOI":"10.3390\/s20030592"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"170","DOI":"10.1016\/j.optlaseng.2019.06.011","article-title":"High-accuracy multi-camera reconstruction enhanced by adaptive point cloud correction algorithm","volume":"122","author":"Chen","year":"2019","journal-title":"Opt. Lasers Eng."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Gomes, P.V., S\u00e1, V.J., Marques, A., Donga, J., Correia, A., and Loureiro, J.P. (2020). Creating Emotions Through Digital Media Art: Building Empathy in Immersive Environments. Multidisciplinary Perspectives on New Media Art, IGI Global.","DOI":"10.4018\/978-1-7998-3669-8.ch007"},{"key":"ref_5","unstructured":"Stein, S.J., and Book, H.E. (2011). The EQ Edge: Emotional Intelligence and Your Success, John Wiley & Sons."},{"key":"ref_6","unstructured":"Jordan, P.W. (2002). Designing Pleasurable Products: An Introduction to the New Human Factors, CRC Press."},{"key":"ref_7","unstructured":"Alexander, R., Dias, S., Hancock, K.S., Leung, E.Y., Macrae, D., Ng, A.Y., O\u2019Neil, S., Schoaff, P.C., Sutton, J., and Ward, T.E. (2001). Systems and Methods for Displaying and Recording Control Interface with Television Programs, Video, Advertising Information and Program Scheduling Information. (No. 6,177,931), U.S. Patent."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"184","DOI":"10.1108\/JAMR-05-2017-0065","article-title":"Emotions as predictor for consumer engagement in YouTube advertisement","volume":"15","author":"Kujur","year":"2018","journal-title":"J. Adv. Manag. Res."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1509\/jmr.13.0593","article-title":"Predicting Advertising success beyond Traditional Measures: New Insights from Neurophysiological Methods and Market Response Modeling","volume":"52","author":"Venkatraman","year":"2015","journal-title":"J. Mark. Res."},{"key":"ref_10","first-page":"8","article-title":"Negotiating the Challenge of Outcome-Based Education","volume":"51","year":"1994","journal-title":"Sch. Adm."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"903","DOI":"10.1016\/j.neubiorev.2010.10.009","article-title":"Is there a core neural network in empathy? An fMRI based quantitative meta-analysis","volume":"35","author":"Fan","year":"2011","journal-title":"Neurosci. Biobehav. Rev."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"457","DOI":"10.5195\/cinej.2020.289","article-title":"Review of Audio-Vision: Sound on Screen","volume":"8","author":"Poulakis","year":"2020","journal-title":"CINEJ Cin\u00e9. J."},{"key":"ref_13","unstructured":"Rebello, S. (2010). Alfred Hitchcock and the Making of Psycho, Open Road Media."},{"key":"ref_14","unstructured":"Coulthard, L. (2017). Sound and Contemporary Screen Violence. The Routledge Companion to Screen Music and Sound, Routledge."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"214","DOI":"10.1037\/a0027392","article-title":"Is empathy related to the perception of emotional expression in music? A multimodal time-series analysis","volume":"6","year":"2012","journal-title":"Psychol. Aesthet. Creat. Arts"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Mera, M., Sadoff, R., and Winters, B. (2017). The Routledge Companion to Screen Music and Sound, Taylor & Francis.","DOI":"10.4324\/9781315681047"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1109\/TSA.2005.860344","article-title":"Automatic mood detection and tracking of music audio signals","volume":"14","author":"Lu","year":"2005","journal-title":"IEEE Trans. Audio Speech Lang. Process."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"173","DOI":"10.1037\/0022-3514.78.1.173","article-title":"Nature over nurture: Temperament, personality, and life span development","volume":"78","author":"McCrae","year":"2000","journal-title":"J. Personal. Soc. Psychol."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Scheirer, E.D., and Slaney, M. (2003). Multi-Feature Speech\/Music Discrimination System. (No. 6,570,991), U.S. Patent.","DOI":"10.1121\/1.1852985"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"293","DOI":"10.1109\/TSA.2002.800560","article-title":"Musical genre classification of audio signals","volume":"10","author":"Tzanetakis","year":"2002","journal-title":"IEEE Trans. Speech Audio Process."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"130","DOI":"10.1109\/LSP.2010.2100380","article-title":"Spectrogram Image Feature for Sound Event Classification in Mismatched Conditions","volume":"18","author":"Dennis","year":"2010","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Janghel, R.R., Sahu, S.P., Rathore, Y.K., Singh, S., and Pawar, U. (2019). Application of Deep Learning in Speech Recognition. Handbook of Research on Deep Learning Innovations and Trends, IGI Global.","DOI":"10.4018\/978-1-5225-7862-8.ch004"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Yoo, S., and Whang, M. (2020). Vagal Tone Differences in Empathy Level Elicited by Different Emotions and a Co-Viewer. Sensors, 20.","DOI":"10.3390\/s20113136"},{"key":"ref_24","first-page":"99","article-title":"Exploring the Response to the Anti-Smoking Advertisements: Ad Liking, Empathy, and Psychological Resistance","volume":"5","author":"Soh","year":"2019","journal-title":"J. Converg. Cult. Technol."},{"key":"ref_25","unstructured":"Britto, A., Gouyon, F., and Dixon, S. (2013, January 4\u20138). Essentia: An audio analysis library for music information retrieval. Proceedings of the 14th Conference of the International Society for Music Information Retrieval (ISMIR), Curitiba, Brazil."},{"key":"ref_26","first-page":"1","article-title":"A large set of audio features for sound description (similarity and classification) in the CUIDADO project. Technical Report","volume":"54","author":"Peeters","year":"2004","journal-title":"CUIDADO Ist Proj. Rep."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1917","DOI":"10.1121\/1.1458024","article-title":"YIN, a fundamental frequency estimator for speech and music","volume":"111","author":"Kawahara","year":"2002","journal-title":"J. Acoust. Soc. Am."},{"key":"ref_28","first-page":"26","article-title":"GrooveNet: Real-time music-driven dance movement generation using artificial neural networks","volume":"8","author":"Alemi","year":"2017","journal-title":"Networks"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"17","DOI":"10.4018\/IJMDEM.2021010102","article-title":"Multimodal Dance Generation Networks Based on Audio-Visual Analysis","volume":"12","author":"Duan","year":"2021","journal-title":"Int. J. Multimed. Data Eng. Manag."},{"key":"ref_30","first-page":"103","article-title":"Effects of Storytelling in Advertising on ConsumersEmpathy","volume":"15","author":"Park","year":"2014","journal-title":"Asia Mark. J."},{"key":"ref_31","unstructured":"Smith, J.O. (2002). Mathematics of the Discrete Fourier Transform (DFT), Center for Computer Research in Music and Acoustics (CCRMA), Department of Music, Stanford University."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"1173","DOI":"10.1016\/j.ygeno.2019.07.002","article-title":"Heuristic filter feature selection methods for medical datasets","volume":"112","author":"Alirezanejad","year":"2019","journal-title":"Genomics"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"4080","DOI":"10.1109\/TIE.2017.2758745","article-title":"Short-time Fourier transform based transient analysis of VSC interfaced point-to-point DC system","volume":"65","author":"Satpathi","year":"2017","journal-title":"IEEE Trans. Ind. Electron."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Stolar, M.N., Lech, M., Bolia, R.S., and Skinner, M. (2017, January 13\u201315). Real time speech emotion recognition using RGB image classification and transfer learning. Proceedings of the 2017 11th International Conference on Signal Processing and Communication Systems (ICSPCS), Surfers Paradise, QLD, Australia.","DOI":"10.1109\/ICSPCS.2017.8270472"},{"key":"ref_35","first-page":"159","article-title":"Sound Visualization for Deaf Assistance Using Mobile Computing","volume":"2","author":"Alhabbash","year":"2015","journal-title":"J. Eng. Res. Technol."},{"key":"ref_36","first-page":"14","article-title":"A Survey Report on Text Classification with Different Term Weighing Methods and Comparison between Classification Algorithms","volume":"75","author":"Patra","year":"2013","journal-title":"Int. J. Comput. Appl."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Kononenko, I. (1991). Semi-Naive Bayesian Classifier. European Working Session on Learning, Springer.","DOI":"10.1007\/BFb0017015"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"He, Q., Xu, Z., Li, S., Li, R., Zhang, S., Wang, N., Pham, B.T., and Chen, W. (2019). Novel Entropy and Rotation Forest-Based Credal Decision Tree Classifier for Landslide Susceptibility Modeling. Entropy, 21.","DOI":"10.3390\/e21020106"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"86","DOI":"10.1016\/j.bspc.2006.05.002","article-title":"Classification of magnetic resonance brain images using wavelets as input to support vector machine and neural network","volume":"1","author":"Chaplot","year":"2006","journal-title":"Biomed. Signal Process. Control."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"409","DOI":"10.1080\/00401706.1996.10484565","article-title":"The Nature of Statistical Learning Theory","volume":"38","author":"Sain","year":"1996","journal-title":"Technometrics"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Wasule, V., and Sonar, P. (2017, January 4\u20135). Classification of brain MRI using SVM and KNN classifier. Proceedings of the 2017 Third International Conference on Sensing, Signal Processing and Security (ICSSS), Chennai, India.","DOI":"10.1109\/SSPS.2017.8071594"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"3326","DOI":"10.1016\/j.neucom.2008.01.031","article-title":"Evolutionary tuning of SVM parameter values in multiclass problems","volume":"71","author":"Lorena","year":"2008","journal-title":"Neurocomputing"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"1676","DOI":"10.1016\/j.neucom.2009.11.042","article-title":"Study and evaluation of a multi-class SVM classifier using diminishing learning technique","volume":"73","author":"Manikandan","year":"2010","journal-title":"Neurocomputing"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Shuai, Y., Zheng, Y., and Huang, H. (2018, January 23\u201325). Hybrid Software Obsolescence Evaluation Model Based on PCA-SVM-GridSearchCV. Proceedings of the 2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS), Beijing, China.","DOI":"10.1109\/ICSESS.2018.8663753"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Huang, Z., Dong, M., Mao, Q., and Zhan, Y. (2014, January 3\u20137). Speech emotion recognition using CNN. Proceedings of the 22nd ACM international conference on Multimedia, Orlando, FL, USA.","DOI":"10.1145\/2647868.2654984"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"2203","DOI":"10.1109\/TMM.2014.2360798","article-title":"Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks","volume":"16","author":"Mao","year":"2014","journal-title":"IEEE Trans. Multimed."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Lim, W., Jang, D., and Lee, T. (2016, January 13\u201316). Speech emotion recognition using convolutional and recurrent neural networks. Proceedings of the 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), Jeju, Korea.","DOI":"10.1109\/APSIPA.2016.7820699"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Zhang, J., Wen, X., and Whang, M. (2020). Recognition of Emotion According to the Physical Elements of the Video. Sensors, 20.","DOI":"10.3390\/s20030649"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/21\/7111\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:24:24Z","timestamp":1760167464000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/21\/7111"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,26]]},"references-count":48,"journal-issue":{"issue":"21","published-online":{"date-parts":[[2021,11]]}},"alternative-id":["s21217111"],"URL":"https:\/\/doi.org\/10.3390\/s21217111","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2021,10,26]]}}}