{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T06:42:51Z","timestamp":1771915371744,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":36,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T00:00:00Z","timestamp":1665360000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,10,10]]},"DOI":"10.1145\/3551876.3554806","type":"proceedings-article","created":{"date-parts":[[2022,9,28]],"date-time":"2022-09-28T22:17:21Z","timestamp":1664403441000},"page":"67-73","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":18,"title":["ViPER"],"prefix":"10.1145","author":[{"given":"Lorenzo","family":"Vaiani","sequence":"first","affiliation":[{"name":"Politecnico di Torino, Turin, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Moreno","family":"La Quatra","sequence":"additional","affiliation":[{"name":"Politecnico di Torino, Turin, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Luca","family":"Cagliero","sequence":"additional","affiliation":[{"name":"Politecnico di Torino, Turin, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Paolo","family":"Garza","sequence":"additional","affiliation":[{"name":"Politecnico di Torino, Turin, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,10,10]]},"reference":[{"key":"e_1_3_2_2_1_1","volume-title":"nateraw\/vit-age-classifier \u00b7 Hugging Face. https:\/\/huggingface.co\/nateraw\/ vit-age-classifier [Online","year":"2022","unstructured":"2022. nateraw\/vit-age-classifier \u00b7 Hugging Face. https:\/\/huggingface.co\/nateraw\/ vit-age-classifier [Online ; accessed 1. Jul. 2022 ]. 2022. nateraw\/vit-age-classifier \u00b7 Hugging Face. https:\/\/huggingface.co\/nateraw\/ vit-age-classifier [Online; accessed 1. Jul. 2022]."},{"key":"e_1_3_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.3390\/electronics10091036"},{"key":"e_1_3_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.semeval-1.90"},{"key":"e_1_3_2_2_4_1","volume-title":"Towards real-time speech emotion recognition for affective e-learning. Education and information technologies 21, 5","author":"Bahreini Kiavash","year":"2016","unstructured":"Kiavash Bahreini , Rob Nadolski , and Wim Westera . 2016. Towards real-time speech emotion recognition for affective e-learning. Education and information technologies 21, 5 ( 2016 ), 1367--1386. Kiavash Bahreini, Rob Nadolski, and Wim Westera. 2016. Towards real-time speech emotion recognition for affective e-learning. Education and information technologies 21, 5 (2016), 1367--1386."},{"key":"e_1_3_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2021.107134"},{"key":"e_1_3_2_2_6_1","unstructured":"Alessandro Bondielli and Lucia C Passaro. 2021. Leveraging CLIP for Image Emotion Recognition.. In NL4AI@ AI* IA.  Alessandro Bondielli and Lucia C Passaro. 2021. Leveraging CLIP for Image Emotion Recognition.. In NL4AI@ AI* IA."},{"key":"e_1_3_2_2_7_1","volume-title":"Wavlm: Large-scale self-supervised pre-training for full stack speech processing. arXiv preprint arXiv:2110.13900","author":"Chen Sanyuan","year":"2021","unstructured":"Sanyuan Chen , Chengyi Wang , Zhengyang Chen , Yu Wu , Shujie Liu , Zhuo Chen , Jinyu Li , Naoyuki Kanda , Takuya Yoshioka , Xiong Xiao , 2021 . Wavlm: Large-scale self-supervised pre-training for full stack speech processing. arXiv preprint arXiv:2110.13900 (2021). Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, et al. 2021. Wavlm: Large-scale self-supervised pre-training for full stack speech processing. arXiv preprint arXiv:2110.13900 (2021)."},{"key":"e_1_3_2_2_8_1","volume-title":"Emotional Reactions, and Stress. (04","author":"Christ Lukas","year":"2022","unstructured":"Lukas Christ , Shahin Amiriparian , Alice Baird , Panagiotis Tzirakis , Alexander Kathan , Niklas Mueller , Lukas Stappen , Eva Messner , Andreas K\u00f6nig , Alan Cowen , Erik Cambria , and Bj\u00f6rn Schuller . 2022. The MuSe 2022 Multimodal Sentiment Analysis Challenge: Humor , Emotional Reactions, and Stress. (04 2022 ). Lukas Christ, Shahin Amiriparian, Alice Baird, Panagiotis Tzirakis, Alexander Kathan, Niklas Mueller, Lukas Stappen, Eva Messner, Andreas K\u00f6nig, Alan Cowen, Erik Cambria, and Bj\u00f6rn Schuller. 2022. The MuSe 2022 Multimodal Sentiment Analysis Challenge: Humor, Emotional Reactions, and Stress. (04 2022)."},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_2_10_1","volume-title":"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ICLR","author":"Dosovitskiy Alexey","year":"2021","unstructured":"Alexey Dosovitskiy , Lucas Beyer , Alexander Kolesnikov , Dirk Weissenborn , Xiaohua Zhai , Thomas Unterthiner , Mostafa Dehghani , Matthias Minderer , Georg Heigold , Sylvain Gelly , Jakob Uszkoreit , and Neil Houlsby . 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ICLR ( 2021 ). Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ICLR (2021)."},{"key":"e_1_3_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2818346.2830596"},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-64559-5_23"},{"key":"e_1_3_2_2_13_1","volume-title":"International conference on machine learning. PMLR, 4651--4664","author":"Jaegle Andrew","year":"2021","unstructured":"Andrew Jaegle , Felix Gimeno , Andy Brock , Oriol Vinyals , Andrew Zisserman , and Joao Carreira . 2021 . Perceiver: General perception with iterative attention . In International conference on machine learning. PMLR, 4651--4664 . Andrew Jaegle, Felix Gimeno, Andy Brock, Oriol Vinyals, Andrew Zisserman, and Joao Carreira. 2021. Perceiver: General perception with iterative attention. In International conference on machine learning. PMLR, 4651--4664."},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACV48630.2021.00159"},{"key":"e_1_3_2_2_15_1","volume-title":"Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692","author":"Liu Yinhan","year":"2019","unstructured":"Yinhan Liu , Myle Ott , Naman Goyal , Jingfei Du , Mandar Joshi , Danqi Chen , Omer Levy , Mike Lewis , Luke Zettlemoyer , and Veselin Stoyanov . 2019 . Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019). Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)."},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/TAFFC.2021.3122146"},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01764"},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/s42452-020-2234-1"},{"key":"e_1_3_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2022.108580"},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2921241"},{"key":"e_1_3_2_2_21_1","volume-title":"The International Research & Innovation Forum","author":"Pujol Francisco A","unstructured":"Francisco A Pujol , Higinio Mora , and Ana Mart\u00ednez . 2019. Emotion recognition to improve e-healthcare systems in smart cities . In The International Research & Innovation Forum . Springer , 245--254. Francisco A Pujol, Higinio Mora, and Ana Mart\u00ednez. 2019. Emotion recognition to improve e-healthcare systems in smart cities. In The International Research & Innovation Forum. Springer, 245--254."},{"key":"e_1_3_2_2_22_1","unstructured":"Delong Qi Weijun Tan Qi Yao and Jingfeng Liu. 2021. YOLO5Face: Why Reinventing a Face Detector. (2021).  Delong Qi Weijun Tan Qi Yao and Jingfeng Liu. 2021. YOLO5Face: Why Reinventing a Face Detector. (2021)."},{"key":"e_1_3_2_2_23_1","volume-title":"International Conference on Machine Learning. PMLR, 8748--8763","author":"Radford Alec","year":"2021","unstructured":"Alec Radford , Jong Wook Kim , Chris Hallacy , Aditya Ramesh , Gabriel Goh , Sandhini Agarwal , Girish Sastry , Amanda Askell , Pamela Mishkin , Jack Clark , 2021 . Learning transferable visual models from natural language supervision . In International Conference on Machine Learning. PMLR, 8748--8763 . Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748--8763."},{"key":"e_1_3_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10772-012-9172-2"},{"key":"e_1_3_2_2_25_1","volume-title":"Human behavior understanding in big multimedia data using CNN based facial expression recognition. Mobile networks and applications 25, 4","author":"Sajjad Muhammad","year":"2020","unstructured":"Muhammad Sajjad , Sana Zahir , Amin Ullah , Zahid Akhtar , and Khan Muhammad . 2020. Human behavior understanding in big multimedia data using CNN based facial expression recognition. Mobile networks and applications 25, 4 ( 2020 ), 1611--1621. Muhammad Sajjad, Sana Zahir, Amin Ullah, Zahid Akhtar, and Khan Muhammad. 2020. Human behavior understanding in big multimedia data using CNN based facial expression recognition. Mobile networks and applications 25, 4 (2020), 1611--1621."},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"crossref","unstructured":"Bj\u00f6rn W Schuller Anton Batliner Shahin Amiriparian Christian Bergler Maurice Gerczuk Natalie Holz Pauline Larrouy-Maestri Sebastian P Bayerl Korbinian Riedhammer Adria Mallol-Ragolta etal 2022. The ACM Multimedia 2022 Computational Paralinguistics Challenge: Vocalisations Stuttering Activity & Mosquitoes. arXiv preprint arXiv:2205.06799 (2022).  Bj\u00f6rn W Schuller Anton Batliner Shahin Amiriparian Christian Bergler Maurice Gerczuk Natalie Holz Pauline Larrouy-Maestri Sebastian P Bayerl Korbinian Riedhammer Adria Mallol-Ragolta et al. 2022. The ACM Multimedia 2022 Computational Paralinguistics Challenge: Vocalisations Stuttering Activity & Mosquitoes. arXiv preprint arXiv:2205.06799 (2022).","DOI":"10.1145\/3503161.3551591"},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2018.8461375"},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3423327.3423672"},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3475957.3484456"},{"key":"e_1_3_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2018.00246"},{"key":"e_1_3_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/LSP.2021.3065598"},{"key":"e_1_3_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/JSTSP.2017.2764438"},{"key":"e_1_3_2_2_33_1","volume-title":"Attention is all you need. Advances in neural information processing systems 30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N Gomez , Lukasz Kaiser , and Illia Polosukhin . 2017. Attention is all you need. Advances in neural information processing systems 30 ( 2017 ). Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017)."},{"key":"e_1_3_2_2_34_1","volume-title":"Proc. 3rd Int. Conf. on Computer Vision Theory and Applications VISAPP","author":"Wimmer Matthias","year":"2008","unstructured":"Matthias Wimmer , Bj\u00f6rn Schuller , Dejan Arsic , Bernd Radig , and Gerhard Rigoll . 2008 . Low-level fusion of audio and video feature for multi-modal emotion recognition . In Proc. 3rd Int. Conf. on Computer Vision Theory and Applications VISAPP , Funchal, Madeira, Portugal. 145--151. Matthias Wimmer, Bj\u00f6rn Schuller, Dejan Arsic, Bernd Radig, and Gerhard Rigoll. 2008. Low-level fusion of audio and video feature for multi-modal emotion recognition. In Proc. 3rd Int. Conf. on Computer Vision Theory and Applications VISAPP, Funchal, Madeira, Portugal. 145--151."},{"key":"e_1_3_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.31887\/DCNS.2015.17.4\/kwolf"},{"key":"e_1_3_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01630"}],"event":{"name":"MM '22: The 30th ACM International Conference on Multimedia","location":"Lisboa Portugal","acronym":"MM '22","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3551876.3554806","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3551876.3554806","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:00:16Z","timestamp":1750186816000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3551876.3554806"}},"subtitle":["Video-based Perceiver for Emotion Recognition"],"short-title":[],"issued":{"date-parts":[[2022,10,10]]},"references-count":36,"alternative-id":["10.1145\/3551876.3554806","10.1145\/3551876"],"URL":"https:\/\/doi.org\/10.1145\/3551876.3554806","relation":{},"subject":[],"published":{"date-parts":[[2022,10,10]]},"assertion":[{"value":"2022-10-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}