{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,17]],"date-time":"2025-10-17T12:41:13Z","timestamp":1760704873425,"version":"build-2065373602"},"publisher-location":"Cham","reference-count":53,"publisher":"Springer Nature Switzerland","isbn-type":[{"type":"print","value":"9783031851865"},{"type":"electronic","value":"9783031851872"}],"license":[{"start":{"date-parts":[[2025,1,1]],"date-time":"2025-01-01T00:00:00Z","timestamp":1735689600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"},{"start":{"date-parts":[[2025,1,1]],"date-time":"2025-01-01T00:00:00Z","timestamp":1735689600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025]]},"DOI":"10.1007\/978-3-031-85187-2_2","type":"book-chapter","created":{"date-parts":[[2025,4,23]],"date-time":"2025-04-23T05:14:23Z","timestamp":1745385263000},"page":"20-36","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["How Do You Perceive My Face? Recognizing Facial Expressions in\u00a0Multi-modal Context by\u00a0Modeling Mental Representations"],"prefix":"10.1007","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7557-1508","authenticated-orcid":false,"given":"Florian","family":"Blume","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-7885-8812","authenticated-orcid":false,"given":"Runfeng","family":"Qu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8145-1732","authenticated-orcid":false,"given":"Pia","family":"Bideau","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4564-9834","authenticated-orcid":false,"given":"Martin","family":"Maier","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8438-1570","authenticated-orcid":false,"given":"Rasha Abdel","family":"Rahman","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2871-9266","authenticated-orcid":false,"given":"Olaf","family":"Hellwich","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2025,4,24]]},"reference":[{"key":"2_CR1","doi-asserted-by":"publisher","unstructured":"Bates, D., M\u00e4chler, M., Bolker, B., Walker, S.: Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67(1), 1\u201348 (2015). https:\/\/doi.org\/10.18637\/jss.v067.i01","DOI":"10.18637\/jss.v067.i01"},{"issue":"2","key":"2_CR2","doi-asserted-by":"publisher","first-page":"248","DOI":"10.1037\/emo0000545","volume":"20","author":"J Baum","year":"2020","unstructured":"Baum, J., Rabovsky, M., Rose, S.B., Abdel Rahman, R.: Clear judgments based on unclear evidence: person evaluation is strongly influenced by untrustworthy gossip. Emotion 20(2), 248\u2013260 (2020). https:\/\/doi.org\/10.1037\/emo0000545","journal-title":"Emotion"},{"key":"2_CR3","doi-asserted-by":"publisher","DOI":"10.1016\/j.iswa.2022.200139","volume":"16","author":"H Bouzid","year":"2022","unstructured":"Bouzid, H., Ballihi, L.: Facial expression video generation based-on spatio-temporal convolutional GAN: FEV-GAN. Intell. Syst. Appl. 16, 200139 (2022). https:\/\/doi.org\/10.1016\/j.iswa.2022.200139","journal-title":"Intell. Syst. Appl."},{"key":"2_CR4","doi-asserted-by":"publisher","unstructured":"Chudasama, V., Kar, P., Gudmalwar, A., Shah, N., Wasnik, P., Onoe, N.: M2FNet: multi-modal fusion network for emotion recognition in conversation. In: 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2022). https:\/\/doi.org\/10.1109\/CVPRW56347.2022.00511","DOI":"10.1109\/CVPRW56347.2022.00511"},{"key":"2_CR5","doi-asserted-by":"publisher","unstructured":"Chumachenko, K., Iosifidis, A., Gabbouj, M.: Self-attention fusion for audiovisual emotion recognition with incomplete data. In: 2022 26th International Conference on Pattern Recognition, ICPR 2022 (2022). https:\/\/doi.org\/10.1109\/ICPR56361.2022.9956592","DOI":"10.1109\/ICPR56361.2022.9956592"},{"issue":"3","key":"2_CR6","doi-asserted-by":"publisher","first-page":"181","DOI":"10.1017\/S0140525X12000477","volume":"36","author":"A Clark","year":"2013","unstructured":"Clark, A.: Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behav. Brain Sci. 36(3), 181\u2013204 (2013). https:\/\/doi.org\/10.1017\/S0140525X12000477","journal-title":"Behav. Brain Sci."},{"key":"2_CR7","doi-asserted-by":"publisher","unstructured":"Dahmouni, A., Rossamy, R., Hamdani, M., Guelzim, I., Ait\u00a0Abdelouahad, A.: Bimodal emotional recognition based on long term recurrent convolutional network. In: Proceedings of the 6th International Conference on Networking, Intelligent Systems & Security, NISS 2023 (2023). https:\/\/doi.org\/10.1145\/3607720.3607740","DOI":"10.1145\/3607720.3607740"},{"issue":"1","key":"2_CR8","doi-asserted-by":"publisher","first-page":"16111","DOI":"10.1038\/s41598-023-42802-x","volume":"13","author":"A Eiserbeck","year":"2023","unstructured":"Eiserbeck, A., Maier, M., Baum, J., Abdel Rahman, R.: Deepfake smiles matter less\u2013the psychological and neural impact of presumed AI-generated faces. Sci. Rep. 13(1), 16111 (2023). https:\/\/doi.org\/10.1038\/s41598-023-42802-x","journal-title":"Sci. Rep."},{"key":"2_CR9","doi-asserted-by":"publisher","first-page":"3480","DOI":"10.1109\/TMM.2021.3099900","volume":"24","author":"SE Eskimez","year":"2022","unstructured":"Eskimez, S.E., Zhang, Y., Duan, Z.: Speech driven talking face generation from a single image and an emotion condition. IEEE Trans. Multimedia 24, 3480\u20133490 (2022). https:\/\/doi.org\/10.1109\/TMM.2021.3099900","journal-title":"IEEE Trans. Multimedia"},{"key":"2_CR10","doi-asserted-by":"publisher","unstructured":"Fang, Z., Liu, Z., Liu, T., Hung, C.C., Xiao, J., Feng, G.: Facial expression GAN for voice-driven face generation. Vis. Comput. 38(3), 1151\u20131164 (2022). https:\/\/doi.org\/10.1007\/s00371-021-02074-w","DOI":"10.1007\/s00371-021-02074-w"},{"key":"2_CR11","doi-asserted-by":"publisher","unstructured":"Franceschini, R., Fini, E., Beyan, C., Conti, A., Arrigoni, F., Ricci, E.: Multimodal emotion recognition with modality-pairwise unsupervised contrastive loss. In: 2022 26th International Conference on Pattern Recognition (ICPR), pp. 2589\u20132596 (2022). https:\/\/doi.org\/10.1109\/ICPR56361.2022.9956589","DOI":"10.1109\/ICPR56361.2022.9956589"},{"key":"2_CR12","unstructured":"Fu, Z., et al.: A cross-modal fusion network based on self-attention and residual structure for multimodal emotion recognition. ArXiv (2021)"},{"key":"2_CR13","doi-asserted-by":"publisher","unstructured":"Ghaleb, E., Niehues, J., Asteriadis, S.: Multimodal attention-mechanism for temporal emotion recognition. In: 2020 IEEE International Conference on Image Processing (ICIP) (2020). https:\/\/doi.org\/10.1109\/ICIP40778.2020.9191019","DOI":"10.1109\/ICIP40778.2020.9191019"},{"key":"2_CR14","unstructured":"Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems (2014)"},{"key":"2_CR15","unstructured":"Huang, H., Li, Z., He, R., Sun, Z., Tan, T.: IntroVAE: introspective variational autoencoders for photographic image synthesis. In: Neural Information Processing Systems (2018)"},{"key":"2_CR16","doi-asserted-by":"publisher","unstructured":"Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. https:\/\/doi.org\/10.48550\/arXiv.1412.6980","DOI":"10.48550\/arXiv.1412.6980"},{"key":"2_CR17","doi-asserted-by":"publisher","unstructured":"Kosti, R., Alvarez, J., Recasens, A., Lapedriza, A.: Context based emotion recognition using EMOTIC dataset. IEEE Trans. Pattern Anal. Mach. Intell. (2019). https:\/\/doi.org\/10.1109\/TPAMI.2019.2916866","DOI":"10.1109\/TPAMI.2019.2916866"},{"key":"2_CR18","doi-asserted-by":"publisher","unstructured":"Kosti, R., Alvarez, J.M., Recasens, A., Lapedriza, A.: Emotion recognition in context. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017). https:\/\/doi.org\/10.1109\/CVPR.2017.212","DOI":"10.1109\/CVPR.2017.212"},{"key":"2_CR19","unstructured":"Larsen, A.B.L., S\u00f8nderby, S.K., Larochelle, H., Winther, O.: Autoencoding beyond pixels using a learned similarity metric. In: Proceedings of The 33rd International Conference on Machine Learning (2016)"},{"key":"2_CR20","doi-asserted-by":"publisher","unstructured":"Lee, J., Kim, S., Kim, S., Park, J., Sohn, K.: Context-aware emotion recognition networks. In: 2019 IEEE\/CVF International Conference on Computer Vision (ICCV) (2019). https:\/\/doi.org\/10.1109\/ICCV.2019.01024","DOI":"10.1109\/ICCV.2019.01024"},{"key":"2_CR21","doi-asserted-by":"publisher","unstructured":"Li, Y., Wang, Y., Cui, Z.: Decoupled multimodal distilling for emotion recognition. In: 2023 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023). https:\/\/doi.org\/10.1109\/CVPR52729.2023.00641","DOI":"10.1109\/CVPR52729.2023.00641"},{"key":"2_CR22","doi-asserted-by":"publisher","unstructured":"Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: 2015 IEEE International Conference on Computer Vision (ICCV) (2015). https:\/\/doi.org\/10.1109\/ICCV.2015.425","DOI":"10.1109\/ICCV.2015.425"},{"issue":"5","key":"2_CR23","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0196391","volume":"13","author":"SR Livingstone","year":"2018","unstructured":"Livingstone, S.R., Russo, F.A.: The ryerson audio-visual database of emotional speech and song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in north American English. PLoS ONE 13(5), e0196391 (2018). https:\/\/doi.org\/10.1371\/journal.pone.0196391","journal-title":"PLoS ONE"},{"issue":"1","key":"2_CR24","doi-asserted-by":"publisher","first-page":"327","DOI":"10.3390\/app12010327","volume":"12","author":"C Luna-Jim\u00e9nez","year":"2021","unstructured":"Luna-Jim\u00e9nez, C., Kleinlein, R., Griol, D., Callejas, Z., Montero, J.M., Fern\u00e1ndez-Mart\u00ednez, F.: A proposal for multimodal emotion recognition using aural transformers and action units on RAVDESS dataset. Appl. Sci. 12(1), 327 (2021). https:\/\/doi.org\/10.3390\/app12010327","journal-title":"Appl. Sci."},{"key":"2_CR25","doi-asserted-by":"publisher","unstructured":"Ma, Y., Zhang, S., Wang, J., Wang, X., Zhang, Y., Deng, Z.: DreamTalk: when expressive talking head generation meets diffusion probabilistic models. https:\/\/doi.org\/10.48550\/arXiv.2312.09767","DOI":"10.48550\/arXiv.2312.09767"},{"key":"2_CR26","doi-asserted-by":"publisher","DOI":"10.1016\/j.concog.2022.103301","volume":"101","author":"M Maier","year":"2022","unstructured":"Maier, M., Blume, F., Bideau, P., Hellwich, O., Abdel Rahman, R.: Knowledge-augmented face perception: prospects for the Bayesian brain-framework to align AI and human vision. Conscious. Cogn. 101, 103301 (2022). https:\/\/doi.org\/10.1016\/j.concog.2022.103301","journal-title":"Conscious. Cogn."},{"key":"2_CR27","doi-asserted-by":"publisher","unstructured":"Mittal, T., Bhattacharya, U., Chandra, R., Bera, A., Manocha, D.: M3ER: multiplicative multimodal emotion recognition using facial, textual, and speech cues. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol.\u00a034 (2020). https:\/\/doi.org\/10.1609\/aaai.v34i02.5492","DOI":"10.1609\/aaai.v34i02.5492"},{"key":"2_CR28","doi-asserted-by":"publisher","unstructured":"Mittal, T., Guhan, P., Bhattacharya, U., Chandra, R., Bera, A., Manocha, D.: EmotiCon: context-aware multimodal emotion recognition using frege\u2019s principle. In: 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14222\u201314231 (2020). https:\/\/doi.org\/10.1109\/CVPR42600.2020.01424","DOI":"10.1109\/CVPR42600.2020.01424"},{"key":"2_CR29","doi-asserted-by":"publisher","DOI":"10.1016\/j.imavis.2023.104676","volume":"133","author":"B Mocanu","year":"2023","unstructured":"Mocanu, B., Tapu, R., Zaharia, T.: Multimodal emotion recognition using cross modal audio-video fusion with attention and deep metric learning. Image Vis. Comput. 133, 104676 (2023). https:\/\/doi.org\/10.1016\/j.imavis.2023.104676","journal-title":"Image Vis. Comput."},{"issue":"3","key":"2_CR30","doi-asserted-by":"publisher","first-page":"2257","DOI":"10.1016\/j.neuroimage.2010.10.047","volume":"54","author":"VI M\u00fcller","year":"2011","unstructured":"M\u00fcller, V.I., et al.: Incongruence effects in crossmodal emotional integration. Neuroimage 54(3), 2257\u20132266 (2011). https:\/\/doi.org\/10.1016\/j.neuroimage.2010.10.047","journal-title":"Neuroimage"},{"key":"2_CR31","doi-asserted-by":"publisher","first-page":"69","DOI":"10.1016\/j.bandc.2016.05.002","volume":"112","author":"M Otten","year":"2017","unstructured":"Otten, M., Seth, A.K., Pinto, Y.: A social bayesian brain: how social knowledge can shape visual perception. Brain Cogn. 112, 69\u201377 (2017). https:\/\/doi.org\/10.1016\/j.bandc.2016.05.002","journal-title":"Brain Cogn."},{"key":"2_CR32","doi-asserted-by":"publisher","unstructured":"Peng, Z., et al.: EmoTalk: speech-driven emotional disentanglement for 3D face animation. In: 2023 IEEE\/CVF International Conference on Computer Vision (ICCV) (2023). https:\/\/doi.org\/10.1109\/ICCV51070.2023.01891","DOI":"10.1109\/ICCV51070.2023.01891"},{"key":"2_CR33","unstructured":"Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning (2021)"},{"issue":"1","key":"2_CR34","doi-asserted-by":"publisher","first-page":"79","DOI":"10.1038\/4580","volume":"2","author":"RPN Rao","year":"1999","unstructured":"Rao, R.P.N., Ballard, D.H.: Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat. Neurosci. 2(1), 79\u201387 (1999). https:\/\/doi.org\/10.1038\/4580","journal-title":"Nat. Neurosci."},{"key":"2_CR35","doi-asserted-by":"publisher","unstructured":"Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022). https:\/\/doi.org\/10.1109\/CVPR52688.2022.01042","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"2_CR36","doi-asserted-by":"publisher","unstructured":"Sadok, S., Leglaive, S., Girin, L., Alameda-Pineda, X., S\u00e9guier, R.: A multimodal dynamical variational autoencoder for audiovisual speech representation learning 172, 106120. https:\/\/doi.org\/10.1016\/j.neunet.2024.106120","DOI":"10.1016\/j.neunet.2024.106120"},{"key":"2_CR37","doi-asserted-by":"publisher","unstructured":"Schneider, S., Baevski, A., Collobert, R., Auli, M.: Wav2Vec: unsupervised pre-training for speech recognition. In: Proceedings of the Interspeech (2019). https:\/\/doi.org\/10.21437\/Interspeech.2019-1873","DOI":"10.21437\/Interspeech.2019-1873"},{"key":"2_CR38","unstructured":"Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations (ICLR 2015) (2015)"},{"key":"2_CR39","doi-asserted-by":"publisher","unstructured":"Sinha, S., Biswas, S., Yadav, R., Bhowmick, B.: Emotion-controllable generalized talking face generation. In: Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (2022). https:\/\/doi.org\/10.24963\/ijcai.2022\/184","DOI":"10.24963\/ijcai.2022\/184"},{"key":"2_CR40","doi-asserted-by":"crossref","unstructured":"Stypu\u0142kowski, M., Vougioukas, K., He, S., Zi\u0119ba, M., Petridis, S., Pantic, M.: Diffused heads: diffusion models beat GANs on talking-face generation. In: Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision (WACV) (2024)","DOI":"10.1109\/WACV57701.2024.00502"},{"issue":"4","key":"2_CR41","doi-asserted-by":"publisher","first-page":"531","DOI":"10.1093\/scan\/nsu088","volume":"10","author":"F Suess","year":"2015","unstructured":"Suess, F., Rabovsky, M., Abdel Rahman, R.: Perceiving emotions in neutral faces: expression processing is biased by affective person knowledge. Soc. Cognit. Affect. Neurosci. 10(4), 531\u2013536 (2015). https:\/\/doi.org\/10.1093\/scan\/nsu088","journal-title":"Soc. Cognit. Affect. Neurosci."},{"issue":"1","key":"2_CR42","doi-asserted-by":"publisher","first-page":"718","DOI":"10.1109\/TAFFC.2020.3029531","volume":"14","author":"N Sun","year":"2023","unstructured":"Sun, N., Lu, Q., Zheng, W., Liu, J., Han, G.: Unsupervised cross-view facial expression image generation and recognition. IEEE Trans. Affect. Comput. 14(1), 718\u2013731 (2023). https:\/\/doi.org\/10.1109\/TAFFC.2020.3029531","journal-title":"IEEE Trans. Affect. Comput."},{"key":"2_CR43","unstructured":"Vaswani, A., et al.: Attention is all you need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol.\u00a030 (2017)"},{"key":"2_CR44","doi-asserted-by":"publisher","unstructured":"Wang, K., et al.: MEAD: a large-scale audio-visual dataset for emotional talking-face generation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M. (eds.) Computer Vision \u2013 ECCV, vol. 12366 (2020). https:\/\/doi.org\/10.1007\/978-3-030-58589-1_42","DOI":"10.1007\/978-3-030-58589-1_42"},{"key":"2_CR45","doi-asserted-by":"publisher","unstructured":"Wieser, M.J., Brosch, T.: Faces in context: a review and systematization of contextual influences on affective face processing. Front. Psychol. 3 (2012). https:\/\/doi.org\/10.3389\/fpsyg.2012.00471","DOI":"10.3389\/fpsyg.2012.00471"},{"key":"2_CR46","doi-asserted-by":"publisher","unstructured":"Xu, C., et al.: High-fidelity generalized emotional talking face generation with multi-modal emotion space learning. In: 2023 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023). https:\/\/doi.org\/10.1109\/CVPR52729.2023.00639","DOI":"10.1109\/CVPR52729.2023.00639"},{"key":"2_CR47","doi-asserted-by":"publisher","unstructured":"Xu, Y., Deng, B., Wang, J., Jing, Y., Pan, J., He, S.: High-resolution face swapping via latent semantics disentanglement. In: 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022). https:\/\/doi.org\/10.1109\/CVPR52688.2022.00749","DOI":"10.1109\/CVPR52688.2022.00749"},{"issue":"11","key":"2_CR48","doi-asserted-by":"publisher","first-page":"2792","DOI":"10.1109\/TMM.2019.2962317","volume":"22","author":"Y Yan","year":"2020","unstructured":"Yan, Y., Huang, Y., Chen, S., Shen, C., Wang, H.: Joint deep learning of facial expression synthesis and recognition. IEEE Trans. Multimedia 22(11), 2792\u20132807 (2020). https:\/\/doi.org\/10.1109\/TMM.2019.2962317","journal-title":"IEEE Trans. Multimedia"},{"key":"2_CR49","doi-asserted-by":"publisher","unstructured":"Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. 23(10), 1499\u20131503. https:\/\/doi.org\/10.1109\/LSP.2016.2603342","DOI":"10.1109\/LSP.2016.2603342"},{"key":"2_CR50","doi-asserted-by":"publisher","unstructured":"Zhang, S., Pan, Y., Wang, J.Z.: Learning emotion representations from verbal and nonverbal communication. In: 2023 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023). https:\/\/doi.org\/10.1109\/CVPR52729.2023.01821","DOI":"10.1109\/CVPR52729.2023.01821"},{"issue":"3","key":"2_CR51","doi-asserted-by":"publisher","first-page":"1681","DOI":"10.1109\/TCSVT.2021.3056098","volume":"32","author":"X Zhang","year":"2022","unstructured":"Zhang, X., Zhang, F., Xu, C.: Joint expression synthesis and representation learning for facial expression recognition. IEEE Trans. Circuits Syst. Video Technol. 32(3), 1681\u20131695 (2022). https:\/\/doi.org\/10.1109\/TCSVT.2021.3056098","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"2_CR52","doi-asserted-by":"publisher","unstructured":"Zhang, Z., Wang, L., Yang, J.: Weakly supervised video emotion detection and prediction via cross-modal temporal erasing network. In: 2023 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023). https:\/\/doi.org\/10.1109\/CVPR52729.2023.01811","DOI":"10.1109\/CVPR52729.2023.01811"},{"issue":"4","key":"2_CR53","doi-asserted-by":"publisher","first-page":"2595","DOI":"10.1109\/TAFFC.2023.3282704","volume":"14","author":"W Zheng","year":"2023","unstructured":"Zheng, W., Yan, L., Wang, F.Y.: Two birds with one stone: knowledge-embedded temporal convolutional transformer for depression detection and emotion recognition. IEEE Trans. Affect. Comput. 14(4), 2595\u20132613 (2023). https:\/\/doi.org\/10.1109\/TAFFC.2023.3282704","journal-title":"IEEE Trans. Affect. Comput."}],"container-title":["Lecture Notes in Computer Science","Pattern Recognition"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-031-85187-2_2","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,17]],"date-time":"2025-10-17T12:03:09Z","timestamp":1760702589000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-031-85187-2_2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025]]},"ISBN":["9783031851865","9783031851872"],"references-count":53,"URL":"https:\/\/doi.org\/10.1007\/978-3-031-85187-2_2","relation":{},"ISSN":["0302-9743","1611-3349"],"issn-type":[{"type":"print","value":"0302-9743"},{"type":"electronic","value":"1611-3349"}],"subject":[],"published":{"date-parts":[[2025]]},"assertion":[{"value":"24 April 2025","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"The authors have no competing interests to declare that are relevant to the content of this article.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Disclosure of Interests"}}]}}