{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T11:05:01Z","timestamp":1776078301123,"version":"3.50.1"},"reference-count":69,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2020,9,17]],"date-time":"2020-09-17T00:00:00Z","timestamp":1600300800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,9,17]],"date-time":"2020-09-17T00:00:00Z","timestamp":1600300800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Multimed Tools Appl"],"published-print":{"date-parts":[[2021,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Affective computing is an emerging area of research that aims to enable intelligent systems to recognize, feel, infer and interpret human emotions. The widely spread online and off-line music videos are one of the rich sources of human emotion analysis because it integrates the composer\u2019s internal feeling through song lyrics, musical instruments performance and visual expression. In general, the metadata which music video customers to choose a product includes high-level semantics like emotion so that automatic emotion analysis might be necessary. In this research area, however, the lack of a labeled dataset is a major problem. Therefore, we first construct a balanced music video emotion dataset including diversity of territory, language, culture and musical instruments. We test this dataset over four unimodal and four multimodal convolutional neural networks (CNN) of music and video. First, we separately fine-tuned each pre-trained unimodal CNN and test the performance on unseen data. In addition, we train a 1-dimensional CNN-based music emotion classifier with raw waveform input. The comparative analysis of each unimodal classifier over various optimizers is made to find the best model that can be integrate into a multimodal structure. The best unimodal modality is integrated with corresponding music and video network features for multimodal classifier. The multimodal structure integrates whole music video features and makes final classification with the SoftMax classifier by a late feature fusion strategy. All possible multimodal structures are also combined into one predictive model to get the overall prediction. All the proposed multimodal structure uses cross-validation to overcome the data scarcity problem (overfitting) at the decision level. The evaluation results using various metrics show a boost in the performance of the multimodal architectures compared to each unimodal emotion classifier. The predictive model by integration of all multimodal structure achieves 88.56% in accuracy, 0.88 in f1-score, and 0.987 in area under the curve (AUC) score. The result suggests human high-level emotions are automatically well classified in the proposed CNN-based multimodal networks, even though a small amount of labeled data samples is available for training.<\/jats:p>","DOI":"10.1007\/s11042-020-08836-3","type":"journal-article","created":{"date-parts":[[2020,9,17]],"date-time":"2020-09-17T15:15:21Z","timestamp":1600355721000},"page":"2887-2905","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":185,"title":["Deep learning-based late fusion of multimodal information for emotion classification of music video"],"prefix":"10.1007","volume":"80","author":[{"given":"Yagya Raj","family":"Pandeya","sequence":"first","affiliation":[]},{"given":"Joonwhoan","family":"Lee","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2020,9,17]]},"reference":[{"key":"8836_CR1","unstructured":"Bahuleyan H (2018) Music genre classification using machine learning techniques. arXiv:1804.01149v1"},{"key":"8836_CR2","doi-asserted-by":"publisher","first-page":"423","DOI":"10.1109\/TPAMI.2018.2798607","volume":"41","author":"T Baltrusaitis","year":"2018","unstructured":"Baltrusaitis T, Ahuja C, Morency LP (2018) Multimodal machine learning:a survey and taxonomy. IEEE Trans Pattern Anal Mach Intell 41:423\u2013443","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"8836_CR3","doi-asserted-by":"crossref","unstructured":"Bottou L (2010) Large-scale machine learning with stochastic gradient descent. Springer proceedings of COMPSTAT\u20192010 177\u2013186","DOI":"10.1007\/978-3-7908-2604-3_16"},{"key":"8836_CR4","doi-asserted-by":"crossref","unstructured":"Carreira J, and Zisserman A (2018) Quo vadis, action recognition? A new model and the kinetics dataset. arXiv:1705.07750v3","DOI":"10.1109\/CVPR.2017.502"},{"key":"8836_CR5","doi-asserted-by":"crossref","unstructured":"Chang WY, Hsu SH, and Chien JH (2017) FATAUVA-net: an integrated deep learning framework for facial attribute recognition, action unit detection, and valence-arousal estimation. IEEE 2160-7516","DOI":"10.1109\/CVPRW.2017.246"},{"key":"8836_CR6","unstructured":"Choi K, Fazekas G, Sandler M and Cho K (2017) Transfer learning for music classification and regression tasks. International Society for Music Information Retrieval Conference, Suzhou, China 141\u2013149"},{"key":"8836_CR7","unstructured":"Clevert DA, Unterthiner T and Hochreiter S (2016) Fast and accurate deep network learning by exponential linear units (elus). arXiv:1511.07289"},{"issue":"38","key":"8836_CR8","doi-asserted-by":"publisher","first-page":"E7900","DOI":"10.1073\/pnas.1702247114","volume":"114","author":"AS Cowen","year":"2017","unstructured":"Cowen AS, Keltner D (2017) Self-report captures 27 distinct categories of emotion bridged by continuous gradients. PNAS 114(38):E7900\u2013E7909","journal-title":"PNAS"},{"key":"8836_CR9","doi-asserted-by":"crossref","unstructured":"Dai W, Dai C, Qu S, Li J, and Das S (2016) Very deep convolutional neural networks for raw waveforms. arXiv:1610.00087v1","DOI":"10.1109\/ICASSP.2017.7952190"},{"key":"8836_CR10","doi-asserted-by":"crossref","unstructured":"Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. IEEE Conference on Computer Vision and Pattern Recognition:1063\u20136919","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"8836_CR11","volume-title":"Audio and face video emotion recognition in the wild using deep neural networks and small datasets. International conference on multimodal interfaces","author":"W Ding","year":"2016","unstructured":"Ding W, Xu M, Huang D, Lin W, Dong M, Yu X, Li H (2016) Audio and face video emotion recognition in the wild using deep neural networks and small datasets. International conference on multimodal interfaces. Tokyo, Japan"},{"key":"8836_CR12","unstructured":"Elshaer MEA, Wisdom S, Mishra T (2019) Transfer learning from sound representations for anger detection in speech. arXiv:1902.02120v1"},{"key":"8836_CR13","volume-title":"Video-based emotion recognition using CNN-RNN and C3D hybrid networks. International conference on multimodal interfaces","author":"Y Fan","year":"2016","unstructured":"Fan Y, Lu X, Li D, Liu Y (2016) Video-based emotion recognition using CNN-RNN and C3D hybrid networks. International conference on multimodal interfaces. Tokyo, Japan"},{"key":"8836_CR14","doi-asserted-by":"crossref","unstructured":"Fridman L, Brown DE, Glazer M, Angell W, Dodd S, Jenik B, Terwilliger J, Patsekin A, Kindelsberger J, Ding L, Seaman S, Mehler A, Sipperley A, Pettinato A, Seppelt B, Angell L, Mehler B, and Reimer B (2019) MIT advanced vehicle technology study: large-scale naturalistic driving study of driver behavior and interaction with automation. arXiv:1711.06976v4","DOI":"10.1109\/ACCESS.2019.2926040"},{"key":"8836_CR15","doi-asserted-by":"crossref","unstructured":"Gao Z, Xuan HZ, Zhang H, Wan S and Choo KKR (2018) Adaptive fusion and category-level dictionary learning model for multi-view human action recognition. IEEE Internet of Things Journal","DOI":"10.1109\/JIOT.2019.2911669"},{"key":"8836_CR16","doi-asserted-by":"publisher","first-page":"641","DOI":"10.1016\/j.future.2018.12.039","volume":"94","author":"Z Gao","year":"2019","unstructured":"Gao Z, Wang YL, Wan SH, Wang DY, Zhang H (2019) Cognitive-inspired class-statistic matching with triple-constrain for camera free 3D object retrieval. Futur Gener Comput Syst 94:641\u2013653","journal-title":"Futur Gener Comput Syst"},{"key":"8836_CR17","unstructured":"Garces, MLE (2018) Transfer learning for illustration classification, arXiv:1806.02682v1"},{"key":"8836_CR18","doi-asserted-by":"crossref","unstructured":"Grekow J (2018) From content-based music emotion recognition to emotion maps of musical pieces. Springer","DOI":"10.1007\/978-3-319-70609-2"},{"key":"8836_CR19","doi-asserted-by":"crossref","unstructured":"Hahnloser RHR, Sarpeshkar R, Mahowald MA, Douglas RJ, and Seung SH (2000) Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature 405-6789-947","DOI":"10.1038\/35016072"},{"key":"8836_CR20","volume-title":"Lecture 6d - a separate, adaptive learning rate for each connection","author":"G Hinton","year":"2012","unstructured":"Hinton G, Srivastava N, and Swersky K (2012) Lecture 6d - a separate, adaptive learning rate for each connection. Slides of Lecture Neural Networks for Machine Learning."},{"key":"8836_CR21","unstructured":"Hong S, Im W, and Yang HS (2017) Content-based video\u2013music retrieval using soft intra-modal structure constraint. arXiv:1704.06761v2."},{"key":"8836_CR22","doi-asserted-by":"crossref","unstructured":"Hussain M, Bird JJ, Faria DR (2018) A study on CNN transfer learning for image classification. UKCI 2018: Advances In Intelligent Systems and Computing, (840) 191-202 Springer","DOI":"10.1007\/978-3-319-97982-3_16"},{"key":"8836_CR23","doi-asserted-by":"crossref","unstructured":"Kahou SE, Bouthillier X, Lamblin P, Gulcehre C and at al. (2015) EmoNets: Multimodal deep learning approaches for emotion recognition in video. arXiv:1503.01800v2.","DOI":"10.1007\/s12193-015-0195-2"},{"key":"8836_CR24","doi-asserted-by":"crossref","unstructured":"Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. IEEE conference on Computer Vision and Pattern Recognition:1725\u20131732","DOI":"10.1109\/CVPR.2014.223"},{"key":"8836_CR25","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.223","volume-title":"Large-scale video classification with convolutional neural networks","author":"A Karpathy","year":"2014","unstructured":"Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R and Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. IEEE Conference on Computer Vision and Pattern Recognition"},{"key":"8836_CR26","unstructured":"Kay W, Carreira J, Simonyan K, Zhang B, Hillier C, Vijayanarasimhan S, Viola F, Green T, Back T, Natsev P, Suleyman M, and Zisserman A (2017) The kinetics human action video dataset. arXiv:1705.06950"},{"key":"8836_CR27","doi-asserted-by":"publisher","first-page":"66","DOI":"10.1016\/j.imavis.2017.01.012","volume":"65","author":"H Kaya","year":"2017","unstructured":"Kaya H, G\u00fcrp\u0131nar F, Salah AA (2017) Video-based emotion recognition in the wild using deep transfer learning and score fusion. Image Vis Comput 65:66\u201375","journal-title":"Image Vis Comput"},{"key":"8836_CR28","unstructured":"Kingma D and Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980"},{"key":"8836_CR29","doi-asserted-by":"crossref","unstructured":"Koelstra S, M\u00a8uhl C, Soleymani M, Lee JS, Yazdani A, Ebrahimi T, Pun T, Nijholt N, and Patras I (2012) DEAP: a database for emotion analysis using physiological signals. IEEE Trans Affect Comput","DOI":"10.1109\/T-AFFC.2011.15"},{"key":"8836_CR30","doi-asserted-by":"crossref","unstructured":"Kunze J, Kirsch L, Kurenkov I, Krug A, Johannsmeier J, and Stober S (2017) Transfer learning for speech recognition on a budget. arXiv:1706.00290v1","DOI":"10.18653\/v1\/W17-2620"},{"key":"8836_CR31","doi-asserted-by":"publisher","unstructured":"Lee J, Park J, Kim KL, Nam J (2018) SampleCNN: end-to-end deep convolutional neural networks using very small filters for music classification. Applied science. https:\/\/doi.org\/10.3390\/app8010150","DOI":"10.3390\/app8010150"},{"key":"8836_CR32","unstructured":"Liu X, Chen Q, Wu X, Yan L, Ann Yang L (2017) CNN based music emotion classification. arXiv:1704.05665"},{"key":"8836_CR33","doi-asserted-by":"publisher","first-page":"341","DOI":"10.1016\/j.mehy.2011.11.016","volume":"78","author":"H L\u00f6vheim","year":"2012","unstructured":"L\u00f6vheim H (2012) A new three-dimensional model for emotions and monoamine neurotransmitters. Med Hypotheses 78:341\u2013348","journal-title":"Med Hypotheses"},{"key":"8836_CR34","doi-asserted-by":"publisher","first-page":"184","DOI":"10.1016\/j.inffus.2018.06.003","volume":"46","author":"Y Ma","year":"2019","unstructured":"Ma Y, Hao Y, Chen M, Chen J, Lu P, Ko\u0161ir A (2019) Audio-visual emotion fusion (AVEF): a deep efficient weighted approach. Information Fusion 46:184\u2013192","journal-title":"Information Fusion"},{"key":"8836_CR35","unstructured":"Mahieux TB, Ellis DP, Whitman B, and Lamere P (2011) The million song dataset. 12th international conference on music information retrieval, Miami FL 591-596"},{"key":"8836_CR36","unstructured":"Minaee S and Abdolrashidi A (2019) Deep-emotion: facial expression recognition using attentional convolutional network. arXiv:1902.01019v1"},{"key":"8836_CR37","unstructured":"Ng JY, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: deep networks for video classification. IEEE conference on computer vision and pattern recognition:4694\u20134702"},{"key":"8836_CR38","doi-asserted-by":"publisher","DOI":"10.1109\/WACV.2017.140","volume-title":"Deep spatio-temporal features for multimodal emotion recognition","author":"D Nguyen","year":"2017","unstructured":"Nguyen D, Nguyen K, Sridharan S, Ghasemi A, Dean D and Fookes C (2017) Deep spatio-temporal features for multimodal emotion recognition. IEEE Winter Conference on Applications of Computer Vision"},{"key":"8836_CR39","doi-asserted-by":"publisher","first-page":"239","DOI":"10.1007\/s10772-017-9396-2","volume":"20","author":"F Noroozi","year":"2017","unstructured":"Noroozi F, Sapi\u0144ski T, Kami\u0144ska D, Anbarjafari G (2017) Vocal-based emotion recognition using random forests and decision tree. International Jornal of Speech Technology 20:239\u2013246","journal-title":"International Jornal of Speech Technology"},{"key":"8836_CR40","unstructured":"Ortega JDS, Senoussaoui M, Granger E, and Pedersoli M (2019) Multimodal fusion with deep neural networks for audio-video emotion recognition. arXiv:1907.03196v1."},{"key":"8836_CR41","volume-title":"Audio-visual emotion recognition using deep transfer learning and multiple temporal models. International conference on multimodal interfaces","author":"X Ouyang","year":"2017","unstructured":"Ouyang X, Kawaai S, Goh EGH, Shen S, Ding W, Ming H, Huang DY (2017) Audio-visual emotion recognition using deep transfer learning and multiple temporal models. International conference on multimodal interfaces. Glasgow, UK"},{"key":"8836_CR42","doi-asserted-by":"publisher","first-page":"154","DOI":"10.5391\/IJFIS.2018.18.2.154","volume":"18-2","author":"YR Pandeya","year":"2018","unstructured":"Pandeya YR, Lee J (2018) Domestic cat sound classification using transfer learning. International Journal of Fuzzy Logic and Intelligent Systems 18-2:154\u2013160","journal-title":"International Journal of Fuzzy Logic and Intelligent Systems"},{"key":"8836_CR43","doi-asserted-by":"crossref","unstructured":"Pandeya YR, Kim D, and Lee J (2018) Domestic cat sound classification using learned features from deep neural nets. Applied science 1949","DOI":"10.3390\/app8101949"},{"key":"8836_CR44","volume-title":"Modeling multimodal cues in a deep learning-based framework for emotion recognition in the wild. International conference on multimodal interfaces","author":"S Pini","year":"2017","unstructured":"Pini S, Ben-Ahmed O, Cornia M, Baraldi L, Cucchiara R, Huet B (2017) Modeling multimodal cues in a deep learning-based framework for emotion recognition in the wild. International conference on multimodal interfaces. Glasgow, UK"},{"key":"8836_CR45","doi-asserted-by":"publisher","first-page":"98","DOI":"10.1016\/j.inffus.2017.02.003","volume":"37","author":"S Poria","year":"2017","unstructured":"Poria S, Cambria E, Bajpai R, Hussain A (2017) A review of affective computing: from unimodal analysis to multimodal fusion. Information Fusion 37:98\u2013125","journal-title":"Information Fusion"},{"key":"8836_CR46","doi-asserted-by":"crossref","unstructured":"Ringeval F, Sonderegger A, Sauer J, and Lalanne D (2013) Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).","DOI":"10.1109\/FG.2013.6553805"},{"key":"8836_CR47","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2013.6637858","volume-title":"Robust EEG emotion classification using segment level decision fusion","author":"V Rozgic","year":"2013","unstructured":"Rozgic V, Vitaladevuni SN, Prasad R (2013) Robust EEG emotion classification using segment level decision fusion. IEEE International Conference on Acoustics, Speech and Signal Processing"},{"key":"8836_CR48","doi-asserted-by":"publisher","first-page":"1161","DOI":"10.1037\/h0077714","volume":"39-6","author":"JA Russell","year":"1980","unstructured":"Russell JA (1980) A circumplex model of affect. J Pers Soc Psychol 39-6:1161\u20131178","journal-title":"J Pers Soc Psychol"},{"key":"8836_CR49","doi-asserted-by":"crossref","unstructured":"Shiqing Z, Shiliang Z, Huang T, Gao W, Tian Q (2018) Learning affective features with a hybrid deep model for audio-visual emotion recognition. IEEE Transactions on Circuits and Systems for Video Technology:28\u201310","DOI":"10.1109\/TCSVT.2017.2719043"},{"key":"8836_CR50","first-page":"1929","volume":"15-1","author":"N Srivastava","year":"2014","unstructured":"Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15-1:1929\u20131958","journal-title":"J Mach Learn Res"},{"key":"8836_CR51","unstructured":"Su YC, Chiu TH, Yeh CY, Huang HF, and Hsu WH (2015) Transfer Learning for Video Recognition with Scarce Training Data for Deep Convolutional Neural Network. arXiv:1409.4127v2"},{"key":"8836_CR52","volume-title":"An improved valence-arousal emotion space for video affective content representation and recognition","author":"K Sun","year":"2009","unstructured":"Sun K, Yu J, Huang Y, and Hu X (2009) An improved valence-arousal emotion space for video affective content representation and recognition. IEEE International Conference on Multimedia and Expo"},{"key":"8836_CR53","doi-asserted-by":"crossref","unstructured":"Tan C, Sun F, Kong T, Zhang W, Yang C, and Liu C (2018) A survey on deep transfer learning. arXiv:1808.01974v1","DOI":"10.1007\/978-3-030-01424-7_27"},{"key":"8836_CR54","doi-asserted-by":"crossref","unstructured":"Thayer RE (1989) The biopsychology of mood and arousal. Oxford University Press","DOI":"10.1093\/oso\/9780195068276.001.0001"},{"key":"8836_CR55","doi-asserted-by":"publisher","first-page":"1325","DOI":"10.1007\/s11280-018-0548-3","volume":"22","author":"H Tian","year":"2019","unstructured":"Tian H, Tao Y, Pouyanfar S, Chen SC, Shyu ML (2019) Multimodal deep representation learning for video classification. World Wide Web 22:1325\u20131341","journal-title":"World Wide Web"},{"key":"8836_CR56","unstructured":"Tiwari SN, Duong NQK, Lefebvre F, Demarty CH, Huet B and Chevallier L (2016) Deep features for multimodal emotion classification. HAL-01289191."},{"key":"8836_CR57","doi-asserted-by":"crossref","unstructured":"Torrey L, Shavlik J (2009) Transfer learning. IGI Global Publication Handbook of Research on Machine Learning Applications","DOI":"10.4018\/978-1-60566-766-9.ch011"},{"key":"8836_CR58","doi-asserted-by":"crossref","unstructured":"Tran D, Bourdev L, Fergus R, Torresani L, and Paluri M (2015) Learning spatiotemporal features with 3D convolutional networks. IEEE International Conference on Computer Vision 4489\u20134497","DOI":"10.1109\/ICCV.2015.510"},{"key":"8836_CR59","unstructured":"Tremblay J, To T, Sundaralingam B, Xiang Y, Fox D, and Birchfield S (2018) Deep object pose estimation for semantic robotic grasping of household objects. arXiv:1809.10790v1"},{"key":"8836_CR60","volume-title":"Using deep and convolutional neural networks for accurate emotion classification on DEAP dataset","author":"S Tripathi","year":"2017","unstructured":"Tripathi S, Acharya S, and Sharma RD (2017) Using deep and convolutional neural networks for accurate emotion classification on DEAP dataset. Twenty-Ninth Association for the Advancement of Artificial Intelligence Conference on Innovative Applications"},{"key":"8836_CR61","doi-asserted-by":"crossref","unstructured":"Tzirakis P, Trigeorgis G, Nicolaou MA, Schuller BW, and Zafeiriou S (2017) End-to-end multimodal emotion recognition using deep neural networks. IEEE Journal of selected topics in signal processing 1301-1309","DOI":"10.1109\/JSTSP.2017.2764438"},{"key":"8836_CR62","doi-asserted-by":"crossref","unstructured":"Wang S, Ji Q (2015) Video affective content analysis: a survey of state-of-the-art methods. IEEE Trans Affect Comput","DOI":"10.1109\/TAFFC.2015.2432791"},{"key":"8836_CR63","doi-asserted-by":"crossref","unstructured":"Wang D, Zheng TF (2015) Transfer learning for speech and language processing. APSIPA Annual Summit and Conference 2015","DOI":"10.1109\/APSIPA.2015.7415532"},{"key":"8836_CR64","doi-asserted-by":"crossref","unstructured":"Wu H, Chen Y, Wang N, and Zhang Z (2019) Sequence level semantics aggregation for video object detection. arXiv:1907.06390v2","DOI":"10.1109\/ICCV.2019.00931"},{"key":"8836_CR65","doi-asserted-by":"crossref","unstructured":"Xu YS, Fu TJ, Yang HK, Lee CY (2018) Dynamic video segmentation network. arXiv:1804.00931v2","DOI":"10.1109\/CVPR.2018.00686"},{"key":"8836_CR66","doi-asserted-by":"crossref","unstructured":"Yang YH and Chen HH (2012) Machine recognition of music emotion: a review. ACM transactions on intelligent systems and technology 3-3-40","DOI":"10.1145\/2168752.2168754"},{"key":"8836_CR67","doi-asserted-by":"crossref","unstructured":"Zhang L and Zhang J (2018) Synchronous prediction of arousal and valence using LSTM network for affective video content analysis. arXiv:1806.00257","DOI":"10.1109\/FSKD.2017.8393364"},{"key":"8836_CR68","doi-asserted-by":"publisher","first-page":"1067","DOI":"10.1016\/j.imavis.2014.09.005","volume":"32","author":"L Zhang","year":"2014","unstructured":"Zhang L, Tjondronegoro D, Chandran V (2014) Representation of facial expression categories in continuous arousal\u2013valence space: feature and correlation. Image Vis Comput 32:1067\u20131079","journal-title":"Image Vis Comput"},{"key":"8836_CR69","doi-asserted-by":"crossref","unstructured":"Zhang S, Zhang S, Huang T, Gao W (2016) Multimodal deep convolutional neural network for audio-visual emotion recognition. ACM on international conference on multimedia retrieval 281-284.","DOI":"10.1145\/2911996.2912051"}],"container-title":["Multimedia Tools and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11042-020-08836-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11042-020-08836-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11042-020-08836-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,14]],"date-time":"2024-08-14T01:52:37Z","timestamp":1723600357000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11042-020-08836-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,9,17]]},"references-count":69,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2021,1]]}},"alternative-id":["8836"],"URL":"https:\/\/doi.org\/10.1007\/s11042-020-08836-3","relation":{},"ISSN":["1380-7501","1573-7721"],"issn-type":[{"value":"1380-7501","type":"print"},{"value":"1573-7721","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,9,17]]},"assertion":[{"value":"10 April 2019","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 February 2020","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 March 2020","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 September 2020","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}