{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,23]],"date-time":"2026-01-23T14:27:23Z","timestamp":1769178443554,"version":"3.49.0"},"publisher-location":"New York, NY, USA","reference-count":34,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,3,7]],"date-time":"2021-03-07T00:00:00Z","timestamp":1615075200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100003453","name":"Natural Science Foundation of Guangdong Province","doi-asserted-by":"publisher","award":["2019A1515010939,2017B010116001"],"award-info":[{"award-number":["2019A1515010939,2017B010116001"]}],"id":[{"id":"10.13039\/501100003453","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61876224"],"award-info":[{"award-number":["61876224"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,3,7]]},"DOI":"10.1145\/3444685.3446289","type":"proceedings-article","created":{"date-parts":[[2021,5,4]],"date-time":"2021-05-04T04:48:41Z","timestamp":1620103721000},"page":"1-6","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":28,"title":["Motion-transformer"],"prefix":"10.1145","author":[{"given":"Yi-Bin","family":"Cheng","sequence":"first","affiliation":[{"name":"Sun Yat-sen University, Guangzhou, Guangdong"}]},{"given":"Xipeng","family":"Chen","sequence":"additional","affiliation":[{"name":"Sun Yat-sen University, Guangzhou, Guangdong"}]},{"given":"Dongyu","family":"Zhang","sequence":"additional","affiliation":[{"name":"Sun Yat-sen University, Guangzhou, Guangdong"}]},{"given":"Liang","family":"Lin","sequence":"additional","affiliation":[{"name":"Sun Yat-sen University, Guangzhou, Guangdong"}]}],"member":"320","published-online":{"date-parts":[[2021,5,3]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"End-to-End Object Detection with Transformers. CoRR abs\/2005.12872","author":"Carion Nicolas","year":"2020","unstructured":"Nicolas Carion , Francisco Massa , Gabriel Synnaeve , Nicolas Usunier , Alexander Kirillov , and Sergey Zagoruyko . 2020. End-to-End Object Detection with Transformers. CoRR abs\/2005.12872 ( 2020 ). Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-End Object Detection with Transformers. CoRR abs\/2005.12872 (2020)."},{"key":"e_1_3_2_1_2_1","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019","volume":"1","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2019 . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding . In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019 , Minneapolis, MN, USA, June 2--7 , 2019, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 4171--4186. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2--7, 2019, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 4171--4186."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.167"},{"key":"e_1_3_2_1_4_1","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015","author":"Du Yong","year":"2015","unstructured":"Yong Du , Wei Wang , and Liang Wang . 2015 . Hierarchical recurrent neural network for skeleton based action recognition . In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015 , Boston, MA, USA, June 7--12 , 2015. IEEE Computer Society, 1110--1118. Yong Du, Wei Wang, and Liang Wang. 2015. Hierarchical recurrent neural network for skeleton based action recognition. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7--12, 2015. IEEE Computer Society, 1110--1118."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.5555\/1756006.1756025"},{"key":"e_1_3_2_1_6_1","volume-title":"6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net.","author":"Gidaris Spyros","year":"2018","unstructured":"Spyros Gidaris , Praveer Singh , and Nikos Komodakis . 2018 . Unsupervised Representation Learning by Predicting Image Rotations . In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net. Spyros Gidaris, Praveer Singh, and Nikos Komodakis. 2018. Unsupervised Representation Learning by Predicting Image Rotations. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33018545"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2017.207"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.79"},{"key":"e_1_3_2_1_10_1","volume-title":"Skeleton-Based Relational Modeling for Action Recognition. CoRR abs\/1805.02556","author":"Li Lin","year":"2018","unstructured":"Lin Li , Wu Zheng , Zhaoxiang Zhang , Yan Huang , and Liang Wang . 2018. Skeleton-Based Relational Modeling for Action Recognition. CoRR abs\/1805.02556 ( 2018 ). Lin Li, Wu Zheng, Zhaoxiang Zhang, Yan Huang, and Liang Wang. 2018. Skeleton-Based Relational Modeling for Action Recognition. CoRR abs\/1805.02556 (2018)."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00572"},{"key":"e_1_3_2_1_12_1","volume-title":"Proceedings, Part III (Lecture Notes in Computer Science","volume":"833","author":"Liu Jun","year":"2016","unstructured":"Jun Liu , Amir Shahroudy , Dong Xu , and Gang Wang . 2016 . Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016 , Proceedings, Part III (Lecture Notes in Computer Science , Vol. 9907). Springer, 816-- 833 . Jun Liu, Amir Shahroudy, Dong Xu, and Gang Wang. 2016. Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016, Proceedings, Part III (Lecture Notes in Computer Science, Vol. 9907). Springer, 816--833."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2017.02.030"},{"key":"e_1_3_2_1_14_1","volume-title":"Unsupervised Learning of Long-Term Motion Dynamics for Videos. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017","author":"Luo Zelun","year":"2017","unstructured":"Zelun Luo , Boya Peng , De-An Huang , Alexandre Alahi , and Li Fei-Fei . 2017 . Unsupervised Learning of Long-Term Motion Dynamics for Videos. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 , Honolulu, HI, USA, July 21--26 , 2017. IEEE Computer Society, 7101--7110. Zelun Luo, Boya Peng, De-An Huang, Alexandre Alahi, and Li Fei-Fei. 2017. Unsupervised Learning of Long-Term Motion Dynamics for Videos. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21--26, 2017. IEEE Computer Society, 7101--7110."},{"key":"e_1_3_2_1_15_1","volume-title":"Proceedings, Part I (Lecture Notes in Computer Science","volume":"544","author":"Misra Ishan","year":"2016","unstructured":"Ishan Misra , C. Lawrence Zitnick , and Martial Hebert . 2016 . Shuffle and Learn: Unsupervised Learning Using Temporal Order Verification. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016 , Proceedings, Part I (Lecture Notes in Computer Science , Vol. 9905). Springer, 527-- 544 . Ishan Misra, C. Lawrence Zitnick, and Martial Hebert. 2016. Shuffle and Learn: Unsupervised Learning Using Temporal Order Verification. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016, Proceedings, Part I (Lecture Notes in Computer Science, Vol. 9905). Springer, 527--544."},{"key":"e_1_3_2_1_16_1","volume-title":"Proceedings, Part VI (Lecture Notes in Computer Science","volume":"84","author":"Noroozi Mehdi","year":"2016","unstructured":"Mehdi Noroozi and Paolo Favaro . 2016 . Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016 , Proceedings, Part VI (Lecture Notes in Computer Science , Vol. 9910). Springer, 69-- 84 . Mehdi Noroozi and Paolo Favaro. 2016. Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016, Proceedings, Part VI (Lecture Notes in Computer Science, Vol. 9910). Springer, 69--84."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.638"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.278"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3123299"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.115"},{"key":"e_1_3_2_1_21_1","volume-title":"Skeleton-Based Action Recognition With Directed Graph Neural Networks. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019","author":"Shi Lei","year":"2019","unstructured":"Lei Shi , Yifan Zhang , Jian Cheng , and Hanqing Lu . 2019 . Skeleton-Based Action Recognition With Directed Graph Neural Networks. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019 , Long Beach, CA, USA, June 16--20 , 2019. Computer Vision Foundation \/ IEEE, 7912--7921. Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. 2019. Skeleton-Based Action Recognition With Directed Graph Neural Networks. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16--20, 2019. Computer Vision Foundation \/ IEEE, 7912--7921."},{"key":"e_1_3_2_1_22_1","volume-title":"Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6--11 July 2015 (JMLR Workshop and Conference Proceedings","volume":"852","author":"Srivastava Nitish","year":"2015","unstructured":"Nitish Srivastava , Elman Mansimov , and Ruslan Salakhutdinov . 2015 . Unsupervised Learning of Video Representations using LSTMs . In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6--11 July 2015 (JMLR Workshop and Conference Proceedings , Vol. 37). JMLR.org, 843-- 852 . Nitish Srivastava, Elman Mansimov, and Ruslan Salakhutdinov. 2015. Unsupervised Learning of Video Representations using LSTMs. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6--11 July 2015 (JMLR Workshop and Conference Proceedings, Vol. 37). JMLR.org, 843--852."},{"key":"e_1_3_2_1_23_1","volume-title":"VL-BERT: Pre-training of Generic Visual-Linguistic Representations. In 8th International Conference on Learning Representations, ICLR 2020","author":"Su Weijie","year":"2020","unstructured":"Weijie Su , Xizhou Zhu , Yue Cao , Bin Li , Lewei Lu , Furu Wei , and Jifeng Dai . 2020 . VL-BERT: Pre-training of Generic Visual-Linguistic Representations. In 8th International Conference on Learning Representations, ICLR 2020 , Addis Ababa, Ethiopia, April 26--30 , 2020. OpenReview.net. Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, and Jifeng Dai. 2020. VL-BERT: Pre-training of Generic Visual-Linguistic Representations. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26--30, 2020. OpenReview.net."},{"key":"e_1_3_2_1_24_1","volume-title":"VideoBERT: A Joint Model for Video and Language Representation Learning. In 2019 IEEE\/CVF International Conference on Computer Vision, ICCV 2019","author":"Sun Chen","year":"2019","unstructured":"Chen Sun , Austin Myers , Carl Vondrick , Kevin Murphy , and Cordelia Schmid . 2019 . VideoBERT: A Joint Model for Video and Language Representation Learning. In 2019 IEEE\/CVF International Conference on Computer Vision, ICCV 2019 , Seoul, Korea (South), October 27 - November 2, 2019. IEEE, 7463--7472. Chen Sun, Austin Myers, Carl Vondrick, Kevin Murphy, and Cordelia Schmid. 2019. VideoBERT: A Joint Model for Video and Language Representation Learning. In 2019 IEEE\/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019. IEEE, 7463--7472."},{"key":"e_1_3_2_1_25_1","volume-title":"Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N. Gomez , Lukasz Kaiser , and Illia Polosukhin . 2017 . Attention is All you Need . In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017 , 4--9 December 2017, Long Beach, CA, USA. 5998--6008. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4--9 December 2017, Long Beach, CA, USA. 5998--6008."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.82"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.281"},{"key":"e_1_3_2_1_28_1","volume-title":"BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model. CoRR abs\/1902.04094","author":"Wang Alex","year":"2019","unstructured":"Alex Wang and Kyunghyun Cho . 2019. BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model. CoRR abs\/1902.04094 ( 2019 ). Alex Wang and Kyunghyun Cho. 2019. BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model. CoRR abs\/1902.04094 (2019)."},{"key":"e_1_3_2_1_29_1","volume-title":"Action Recognition with Improved Trajectories. In IEEE International Conference on Computer Vision, ICCV 2013","author":"Wang Heng","year":"2013","unstructured":"Heng Wang and Cordelia Schmid . 2013 . Action Recognition with Improved Trajectories. In IEEE International Conference on Computer Vision, ICCV 2013 , Sydney, Australia, December 1--8 , 2013. IEEE Computer Society, 3551--3558. Heng Wang and Cordelia Schmid. 2013. Action Recognition with Improved Trajectories. In IEEE International Conference on Computer Vision, ICCV 2013, Sydney, Australia, December 1--8, 2013. IEEE Computer Society, 3551--3558."},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00413"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.12328"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.233"},{"key":"e_1_3_2_1_33_1","volume-title":"Bayesian Graph Convolution LSTM for Skeleton Based Action Recognition. In 2019 IEEE\/CVF International Conference on Computer Vision, ICCV 2019","author":"Zhao Rui","year":"2019","unstructured":"Rui Zhao , Kang Wang , Hui Su , and Qiang Ji . 2019 . Bayesian Graph Convolution LSTM for Skeleton Based Action Recognition. In 2019 IEEE\/CVF International Conference on Computer Vision, ICCV 2019 , Seoul, Korea (South), October 27 - November 2, 2019. IEEE, 6881--6891. Rui Zhao, Kang Wang, Hui Su, and Qiang Ji. 2019. Bayesian Graph Convolution LSTM for Skeleton Based Action Recognition. In 2019 IEEE\/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019. IEEE, 6881--6891."},{"key":"e_1_3_2_1_34_1","volume-title":"Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence","author":"Zheng Nenggan","year":"2018","unstructured":"Nenggan Zheng , Jun Wen , Risheng Liu , Liangqu Long , Jianhua Dai , and Zhefeng Gong . 2018. Unsupervised Representation Learning With Long-Term Dynamics for Skeleton Based Action Recognition . In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence , New Orleans , Louisiana, USA, February 2--7, 2018 . AAAI Press , 2644--2651. Nenggan Zheng, Jun Wen, Risheng Liu, Liangqu Long, Jianhua Dai, and Zhefeng Gong. 2018. Unsupervised Representation Learning With Long-Term Dynamics for Skeleton Based Action Recognition. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, February 2--7, 2018. AAAI Press, 2644--2651."}],"event":{"name":"MMAsia '20: ACM Multimedia Asia","location":"Virtual Event Singapore","acronym":"MMAsia '20","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 2nd ACM International Conference on Multimedia in Asia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3444685.3446289","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3444685.3446289","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:03:19Z","timestamp":1750197799000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3444685.3446289"}},"subtitle":["self-supervised pre-training for skeleton-based action recognition"],"short-title":[],"issued":{"date-parts":[[2021,3,7]]},"references-count":34,"alternative-id":["10.1145\/3444685.3446289","10.1145\/3444685"],"URL":"https:\/\/doi.org\/10.1145\/3444685.3446289","relation":{},"subject":[],"published":{"date-parts":[[2021,3,7]]},"assertion":[{"value":"2021-05-03","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}