{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:18:11Z","timestamp":1750220291998,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":31,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,3,18]],"date-time":"2022-03-18T00:00:00Z","timestamp":1647561600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Science and Technology Innovation Foundation of Shenzhen","award":["JCYJ20180504165826861"],"award-info":[{"award-number":["JCYJ20180504165826861"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,3,18]]},"DOI":"10.1145\/3531232.3531243","type":"proceedings-article","created":{"date-parts":[[2022,6,1]],"date-time":"2022-06-01T16:09:37Z","timestamp":1654099777000},"page":"80-84","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Two-Stage Self-Supervised Learning for Facial Action Unit Recognition"],"prefix":"10.1145","author":[{"given":"Hao","family":"Cheng","sequence":"first","affiliation":[{"name":"Beijing Institute of Technology, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiang","family":"Xie","sequence":"additional","affiliation":[{"name":"Beijing Institute of Technology, China and Shenzhen Research Institute, Beijing Institute of Technology, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shuang","family":"Liang","sequence":"additional","affiliation":[{"name":"Beijing Institute of Technology, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,6]]},"reference":[{"issue":"2","key":"e_1_3_2_1_1_1","first-page":"5","article-title":"Facial action coding system: a technique for the measurement of facial movement","volume":"3","author":"Ekman Paul","year":"1978","unstructured":"Friesen, E., and Paul Ekman . Facial action coding system: a technique for the measurement of facial movement . Palo Alto 3 , no. 2 ( 1978 ): 5 . Friesen, E., and Paul Ekman. Facial action coding system: a technique for the measurement of facial movement. Palo Alto 3, no. 2 (1978): 5.","journal-title":"Palo Alto"},{"key":"e_1_3_2_1_2_1","first-page":"10924","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Zeng Jiabei","year":"2019","unstructured":"Li, Yong, Jiabei Zeng , Shiguang Shan , and Xilin Chen . Self-supervised representation learning from videos for facial action unit detection . In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition , pp. 10924 - 10933 . 2019 . Li, Yong, Jiabei Zeng, Shiguang Shan, and Xilin Chen. Self-supervised representation learning from videos for facial action unit detection. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 10924-10933. 2019."},{"key":"e_1_3_2_1_3_1","volume-title":"BMVC.","author":"Tavabi Leili","year":"2020","unstructured":"Lu, Liupei, Leili Tavabi , and Mohammad Soleymani . Self-Supervised Learning for Facial Action Unit Recognition through Temporal Consistency . In BMVC. 2020 . Lu, Liupei, Leili Tavabi, and Mohammad Soleymani. Self-Supervised Learning for Facial Action Unit Recognition through Temporal Consistency. In BMVC. 2020."},{"key":"e_1_3_2_1_4_1","first-page":"302","volume-title":"BMVC","author":"Sophia A.","unstructured":"Koepke, A. Sophia , Olivia Wiles , and Andrew Zisserman . Self-supervised learning of a facial attribute embedding from video . In BMVC , p. 302 . 2018. Koepke, A. Sophia, Olivia Wiles, and Andrew Zisserman. Self-supervised learning of a facial attribute embedding from video. In BMVC, p. 302. 2018."},{"key":"e_1_3_2_1_5_1","first-page":"1038","volume-title":"Shiliang Pu. Self-Supervised Regional and Temporal Auxiliary Tasks for Facial Action Unit Recognition. In Proceedings of the 29th ACM International Conference on Multimedia","author":"Wang Jingjing","year":"2021","unstructured":"Yan, Jingwei, Jingjing Wang , Qiang Li , Chunmao Wang , and Shiliang Pu. Self-Supervised Regional and Temporal Auxiliary Tasks for Facial Action Unit Recognition. In Proceedings of the 29th ACM International Conference on Multimedia , pp. 1038 - 1046 . 2021 . Yan, Jingwei, Jingjing Wang, Qiang Li, Chunmao Wang, and Shiliang Pu. Self-Supervised Regional and Temporal Auxiliary Tasks for Facial Action Unit Recognition. In Proceedings of the 29th ACM International Conference on Multimedia, pp. 1038-1046. 2021."},{"key":"e_1_3_2_1_6_1","volume-title":"An empirical study of training self-supervised vision transformers. arXiv preprint arXiv:2104.02057","author":"Xie Saining","year":"2021","unstructured":"Chen, Xinlei, Saining Xie , and Kaiming He . An empirical study of training self-supervised vision transformers. arXiv preprint arXiv:2104.02057 ( 2021 ). Chen, Xinlei, Saining Xie, and Kaiming He. An empirical study of training self-supervised vision transformers. arXiv preprint arXiv:2104.02057 (2021)."},{"key":"e_1_3_2_1_7_1","volume-title":"Masked autoencoders are scalable vision learners. arXiv preprint arXiv:2111.06377","author":"Chen Xinlei","year":"2021","unstructured":"He, Kaiming, Xinlei Chen , Saining Xie , Yanghao Li , Piotr Doll\u00e1r , and Ross Girshick . Masked autoencoders are scalable vision learners. arXiv preprint arXiv:2111.06377 ( 2021 ). He, Kaiming, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Doll\u00e1r, and Ross Girshick. Masked autoencoders are scalable vision learners. arXiv preprint arXiv:2111.06377 (2021)."},{"key":"e_1_3_2_1_8_1","volume-title":"Carl Doersch Bootstrap your own latent: A new approach to self-supervised learning. arXiv preprint arXiv:2006.07733","author":"Strub Florian","year":"2020","unstructured":"Grill, Jean-Bastien, Florian Strub , Florent Altch\u00e9 , Corentin Tallec , Pierre H. Richemond , Elena Buchatskaya , Carl Doersch Bootstrap your own latent: A new approach to self-supervised learning. arXiv preprint arXiv:2006.07733 ( 2020 ). Grill, Jean-Bastien, Florian Strub, Florent Altch\u00e9, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch Bootstrap your own latent: A new approach to self-supervised learning. arXiv preprint arXiv:2006.07733 (2020)."},{"key":"e_1_3_2_1_9_1","first-page":"1691","volume-title":"International Conference on Machine Learning","author":"Radford Alec","year":"2020","unstructured":"Chen, Mark, Alec Radford , Rewon Child , Jeffrey Wu , Heewoo Jun , David Luan , and Ilya Sutskever . Generative pretraining from pixels . In International Conference on Machine Learning , pp. 1691 - 1703 . PMLR, 2020 . Chen, Mark, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, and Ilya Sutskever. Generative pretraining from pixels. In International Conference on Machine Learning, pp. 1691-1703. PMLR, 2020."},{"key":"e_1_3_2_1_10_1","volume-title":"Masked Feature Prediction for Self-Supervised Visual Pre-Training. arXiv preprint arXiv:2112.09133","author":"Fan Haoqi","year":"2021","unstructured":"Wei, Chen, Haoqi Fan , Saining Xie , Chao-Yuan Wu , Alan Yuille , and Christoph Feichtenhofer . Masked Feature Prediction for Self-Supervised Visual Pre-Training. arXiv preprint arXiv:2112.09133 ( 2021 ). Wei, Chen, Haoqi Fan, Saining Xie, Chao-Yuan Wu, Alan Yuille, and Christoph Feichtenhofer. Masked Feature Prediction for Self-Supervised Visual Pre-Training. arXiv preprint arXiv:2112.09133 (2021)."},{"key":"e_1_3_2_1_11_1","volume-title":"Mostafa Dehghani An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929","author":"Beyer Lucas","year":"2020","unstructured":"Dosovitskiy, Alexey, Lucas Beyer , Alexander Kolesnikov , Dirk Weissenborn , Xiaohua Zhai , Thomas Unterthiner , Mostafa Dehghani An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 ( 2020 ). Dosovitskiy, Alexey, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)."},{"key":"e_1_3_2_1_12_1","first-page":"67","volume-title":"2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018","author":"Shen Li","year":"2018","unstructured":"Cao, Qiong, Li Shen , Weidi Xie , Omkar M. Parkhi , and Andrew Zisserman . Vggface2 : A dataset for recognising faces across pose and age . In 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018 ), pp. 67 - 74 . IEEE, 2018 . Cao, Qiong, Li Shen, Weidi Xie, Omkar M. Parkhi, and Andrew Zisserman. Vggface2: A dataset for recognising faces across pose and age. In 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018), pp. 67-74. IEEE, 2018."},{"key":"e_1_3_2_1_13_1","volume-title":"Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748","author":"Aaron","year":"2018","unstructured":"Oord, Aaron van den, Yazhe Li , and Oriol Vinyals . Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 ( 2018 ). Oord, Aaron van den, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/T-AFFC.2013.4"},{"key":"e_1_3_2_1_15_1","first-page":"1817","volume-title":"2012 19th IEEE International Conference on Image Processing","author":"Mohammad S.","year":"2012","unstructured":"Mavadati, S. Mohammad , Mohammad H. Mahoor , Kevin Bartlett , and Philip Trinh . Automatic detection of non-posed facial action units . In 2012 19th IEEE International Conference on Image Processing , pp. 1817 - 1820 . IEEE, 2012 . Mavadati, S. Mohammad, Mohammad H. Mahoor, Kevin Bartlett, and Philip Trinh. Automatic detection of non-posed facial action units. In 2012 19th IEEE International Conference on Image Processing, pp. 1817-1820. IEEE, 2012."},{"key":"e_1_3_2_1_16_1","first-page":"5998","volume-title":"Advances in neural information processing systems","author":"Shazeer Noam","year":"2017","unstructured":"Vaswani, Ashish, Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N. Gomez , \u0141ukasz Kaiser , and Illia Polosukhin . Attention is all you need . In Advances in neural information processing systems , pp. 5998 - 6008 . 2017 . Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, pp. 5998-6008. 2017."},{"key":"e_1_3_2_1_17_1","volume-title":"BEiT: BERT Pre-Training of Image Transformers. arXiv preprint arXiv:2106.08254","author":"Dong Li","year":"2021","unstructured":"Bao, Hangbo, Li Dong , and Furu Wei . BEiT: BERT Pre-Training of Image Transformers. arXiv preprint arXiv:2106.08254 ( 2021 ). Bao, Hangbo, Li Dong, and Furu Wei. BEiT: BERT Pre-Training of Image Transformers. arXiv preprint arXiv:2106.08254 (2021)."},{"key":"e_1_3_2_1_18_1","first-page":"11917","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Han Hu","year":"2019","unstructured":"Niu, Xuesong, Hu Han , Songfan Yang , Yan Huang , and Shiguang Shan . Local relationship learning with person-specific shape regularization for facial action unit detection . In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition , pp. 11917 - 11926 . 2019 . Niu, Xuesong, Hu Han, Songfan Yang, Yan Huang, and Shiguang Shan. Local relationship learning with person-specific shape regularization for facial action unit detection. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 11917-11926. 2019."},{"key":"e_1_3_2_1_19_1","first-page":"8594","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"33","author":"Zhu Xin","year":"2019","unstructured":"Li, Guanbin, Xin Zhu , Yirui Zeng , Qing Wang , and Liang Lin . Semantic relationships guided representation learning for facial action unit recognition . In Proceedings of the AAAI Conference on Artificial Intelligence , vol. 33 , no. 01, pp. 8594 - 8601 . 2019 . Li, Guanbin, Xin Zhu, Yirui Zeng, Qing Wang, and Liang Lin. Semantic relationships guided representation learning for facial action unit recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 8594-8601. 2019."},{"key":"e_1_3_2_1_20_1","volume-title":"Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907","author":"Thomas","year":"2016","unstructured":"Kipf, Thomas N., and Max Welling . Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 ( 2016 ). Kipf, Thomas N., and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)."},{"key":"e_1_3_2_1_21_1","volume-title":"Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101","author":"Hutter Frank","year":"2017","unstructured":"Loshchilov, Ilya, and Frank Hutter . Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 ( 2017 ). Loshchilov, Ilya, and Frank Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)."},{"key":"e_1_3_2_1_22_1","volume-title":"large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677","author":"Doll\u00e1r Piotr","year":"2017","unstructured":"Goyal, Priya, Piotr Doll\u00e1r , Ross Girshick , Pieter Noordhuis , Lukasz Wesolowski , Aapo Kyrola , Andrew Tulloch , Yangqing Jia , and Kaiming He. Accurate , large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677 ( 2017 ). Goyal, Priya, Piotr Doll\u00e1r, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677 (2017)."},{"key":"e_1_3_2_1_23_1","first-page":"646","volume-title":"European conference on computer vision","author":"Sun Yu","year":"2016","unstructured":"Huang, Gao, Yu Sun , Zhuang Liu , Daniel Sedra , and Kilian Q. Weinberger . Deep networks with stochastic depth . In European conference on computer vision , pp. 646 - 661 . Springer, Cham , 2016 . Huang, Gao, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian Q. Weinberger. Deep networks with stochastic depth. In European conference on computer vision, pp. 646-661. Springer, Cham, 2016."},{"key":"e_1_3_2_1_24_1","first-page":"6023","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Han Dongyoon","year":"2019","unstructured":"Yun, Sangdoo, Dongyoon Han , Seong Joon Oh , Sanghyuk Chun , Junsuk Choe , and Youngjoon Yoo . Cutmix : Regularization strategy to train strong classifiers with localizable features . In Proceedings of the IEEE\/CVF International Conference on Computer Vision , pp. 6023 - 6032 . 2019 . Yun, Sangdoo, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE\/CVF International Conference on Computer Vision, pp. 6023-6032. 2019."},{"key":"e_1_3_2_1_25_1","volume-title":"International Conference on Learning Representations.","author":"Cisse Moustapha","year":"2018","unstructured":"Zhang, Hongyi, Moustapha Cisse , Yann N. Dauphin , and David Lopez-Paz . mixup : Beyond Empirical Risk Minimization . In International Conference on Learning Representations. 2018 . Zhang, Hongyi, Moustapha Cisse, Yann N. Dauphin, and David Lopez-Paz. mixup: Beyond Empirical Risk Minimization. In International Conference on Learning Representations. 2018."},{"key":"e_1_3_2_1_26_1","first-page":"2818","volume-title":"Proceedings of the IEEE conference on computer vision and pattern recognition","author":"Vanhoucke Vincent","year":"2016","unstructured":"Szegedy, Christian, Vincent Vanhoucke , Sergey Ioffe , Jon Shlens , and Zbigniew Wojna . Rethinking the inception architecture for computer vision . In Proceedings of the IEEE conference on computer vision and pattern recognition , pp. 2818 - 2826 . 2016 . Szegedy, Christian, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818-2826. 2016."},{"volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition.","year":"2020","key":"e_1_3_2_1_27_1","unstructured":"He, Kaiming , Momentum contrast for unsupervised visual representation learning . Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 2020 . He, Kaiming, Momentum contrast for unsupervised visual representation learning. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 2020."},{"key":"e_1_3_2_1_28_1","volume-title":"Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980","author":"Diederik","year":"2014","unstructured":"Kingma, Diederik P., and Jimmy Ba . Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 ( 2014 ). Kingma, Diederik P., and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)."},{"key":"e_1_3_2_1_29_1","article-title":"Learning representations for facial actions from unlabeled videos","author":"Zeng Jiabei","year":"2020","unstructured":"Li, Yong, Jiabei Zeng , and Shiguang Shan . Learning representations for facial actions from unlabeled videos . IEEE Transactions on Pattern Analysis and Machine Intelligence ( 2020 ). Li, Yong, Jiabei Zeng, and Shiguang Shan. Learning representations for facial actions from unlabeled videos. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020).","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence ("},{"key":"e_1_3_2_1_30_1","first-page":"705","volume-title":"Proceedings of the European conference on computer vision (ECCV)","author":"Liu Zhilei","year":"2018","unstructured":"Shao, Zhiwen, Zhilei Liu , Jianfei Cai , and Lizhuang Ma . Deep adaptive attention for joint facial action unit detection and face alignment . In Proceedings of the European conference on computer vision (ECCV) , pp. 705 - 720 . 2018 . Shao, Zhiwen, Zhilei Liu, Jianfei Cai, and Lizhuang Ma. Deep adaptive attention for joint facial action unit detection and face alignment. In Proceedings of the European conference on computer vision (ECCV), pp. 705-720. 2018."},{"key":"e_1_3_2_1_31_1","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"1","author":"Chen Lisha","year":"2021","unstructured":"Song, Tengfei, Lisha Chen , Wenming Zheng , and Qiang Ji . Uncertain graph neural networks for facial action unit detection . In Proceedings of the AAAI Conference on Artificial Intelligence , vol. 1 . 2021 . Song, Tengfei, Lisha Chen, Wenming Zheng, and Qiang Ji. Uncertain graph neural networks for facial action unit detection. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 1. 2021."}],"event":{"name":"IVSP 2022: 2022 4th International Conference on Image, Video and Signal Processing","acronym":"IVSP 2022","location":"Singapore Singapore"},"container-title":["2022 4th International Conference on Image, Video and Signal Processing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3531232.3531243","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3531232.3531243","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:31:31Z","timestamp":1750188691000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3531232.3531243"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,18]]},"references-count":31,"alternative-id":["10.1145\/3531232.3531243","10.1145\/3531232"],"URL":"https:\/\/doi.org\/10.1145\/3531232.3531243","relation":{},"subject":[],"published":{"date-parts":[[2022,3,18]]},"assertion":[{"value":"2022-06-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}