{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,27]],"date-time":"2025-11-27T02:58:14Z","timestamp":1764212294435,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":32,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,12,23]],"date-time":"2022-12-23T00:00:00Z","timestamp":1671753600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,12,23]]},"DOI":"10.1145\/3579109.3579133","type":"proceedings-article","created":{"date-parts":[[2023,3,14]],"date-time":"2023-03-14T07:46:29Z","timestamp":1678779989000},"page":"140-143","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Pseudo Random Masked AutoEncoder for Self-supervised Learning"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0277-2584","authenticated-orcid":false,"given":"Li","family":"Tian","sequence":"first","affiliation":[{"name":"Guangdong Mechanical and Electrical Polytechnic, China and \rSouth China University of Technology,, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3510-3731","authenticated-orcid":false,"given":"Yuan","family":"Cheng","sequence":"additional","affiliation":[{"name":"Guangdong Mechanical and Electrical Polytechnic, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2199-5604","authenticated-orcid":false,"given":"Zhibin","family":"Li","sequence":"additional","affiliation":[{"name":"Guangdong Mechanical and Electrical Polytechnic, China"}]}],"member":"320","published-online":{"date-parts":[[2023,3,14]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"NeurIPS","author":"Brown Tom","year":"2020","unstructured":"Tom Brown , Benjamin Mann , Nick Ryder , Melanie Subbiah , Jared D Kaplan , Prafulla Dhariwal , Arvind Neelakantan , Pranav Shyam , Girish Sastry , Amanda Askell , Language models are few-shot learners . NeurIPS , 2020 . Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Language models are few-shot learners. NeurIPS, 2020."},{"unstructured":"Alec Radford Karthik Narasimhan Tim Salimans and Ilya Sutskever. Improving language understanding by generative pre-training. 2018  Alec Radford Karthik Narasimhan Tim Salimans and Ilya Sutskever. Improving language understanding by generative pre-training. 2018","key":"e_1_3_2_1_2_1"},{"key":"e_1_3_2_1_3_1","volume-title":"Language models are unsupervised multitask learners. OpenAI blog","author":"Radford Alec","year":"2019","unstructured":"Alec Radford , Jeffrey Wu , Rewon Child , David Luan , Dario Amodei , Ilya Sutskever , Language models are unsupervised multitask learners. OpenAI blog , 2019 . Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, Language models are unsupervised multitask learners. OpenAI blog, 2019."},{"key":"e_1_3_2_1_4_1","volume-title":"Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 , 2018 . Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018."},{"key":"e_1_3_2_1_5_1","volume-title":"Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. arXiv preprint arXiv:2203.12602","author":"Tong Zhan","year":"2022","unstructured":"Zhan Tong , Yibing Song , Jue Wang , and Limin Wang . Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. arXiv preprint arXiv:2203.12602 , 2022 . Zhan Tong, Yibing Song, Jue Wang, and Limin Wang. Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. arXiv preprint arXiv:2203.12602, 2022."},{"key":"e_1_3_2_1_6_1","volume-title":"Masked feature prediction for self-supervised visual pre-training. arXiv preprint arXiv:2112.09133","author":"Wei Chen","year":"2021","unstructured":"Chen Wei , Haoqi Fan , Saining Xie , Chao-Yuan Wu , Alan Yuille , and Christoph Feichtenhofer . Masked feature prediction for self-supervised visual pre-training. arXiv preprint arXiv:2112.09133 , 2021 . Chen Wei, Haoqi Fan, Saining Xie, Chao-Yuan Wu, Alan Yuille, and Christoph Feichtenhofer. Masked feature prediction for self-supervised visual pre-training. arXiv preprint arXiv:2112.09133, 2021."},{"key":"e_1_3_2_1_7_1","volume-title":"CVPR","author":"Xie Zhenda","year":"2021","unstructured":"Zhenda Xie , Zheng Zhang , Yue Cao , Yutong Lin , Jianmin Bao , Zhuliang Yao , Qi Dai , and Han Hu. Simmim : A simple framework for masked image modeling . In CVPR , 2021 . Zhenda Xie, Zheng Zhang, Yue Cao, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai, and Han Hu. Simmim: A simple framework for masked image modeling. In CVPR, 2021."},{"key":"e_1_3_2_1_8_1","volume-title":"CVPR","author":"He Kaiming","year":"2021","unstructured":"Kaiming He , Xinlei Chen , Saining Xie , Yanghao Li , Piotr Doll\u00e1r , and Ross Girshick . Masked autoencoders are scalable vision learners . In CVPR , 2021 . Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Doll\u00e1r, and Ross Girshick. Masked autoencoders are scalable vision learners. In CVPR, 2021."},{"key":"e_1_3_2_1_9_1","volume-title":"Context autoencoder for self-supervised representation learning. arXiv preprint arXiv:2202.03026","author":"Chen Xiaokang","year":"2022","unstructured":"Xiaokang Chen , Mingyu Ding , Xiaodi Wang , Ying Xin , Shentong Mo , Yunhao Wang , Shumin Han , Ping Luo , Gang Zeng , and Jingdong Wang . Context autoencoder for self-supervised representation learning. arXiv preprint arXiv:2202.03026 , 2022 . Xiaokang Chen, Mingyu Ding, Xiaodi Wang, Ying Xin, Shentong Mo, Yunhao Wang, Shumin Han, Ping Luo, Gang Zeng, and Jingdong Wang. Context autoencoder for self-supervised representation learning. arXiv preprint arXiv:2202.03026, 2022."},{"key":"e_1_3_2_1_10_1","volume-title":"ibot: Image bert pre-training with online tokenizer. arXiv preprint arXiv:2111.07832","author":"Zhou Jinghao","year":"2021","unstructured":"Jinghao Zhou , Chen Wei , Huiyu Wang , Wei Shen , Cihang Xie , Alan Yuille , and Tao Kong . ibot: Image bert pre-training with online tokenizer. arXiv preprint arXiv:2111.07832 , 2021 Jinghao Zhou, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, and Tao Kong. ibot: Image bert pre-training with online tokenizer. arXiv preprint arXiv:2111.07832, 2021"},{"key":"e_1_3_2_1_11_1","volume-title":"Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254","author":"Bao Hangbo","year":"2021","unstructured":"Hangbo Bao , Li Dong , and Furu Wei . Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254 , 2021 . Hangbo Bao, Li Dong, and Furu Wei. Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254, 2021."},{"key":"e_1_3_2_1_12_1","volume-title":"Peco: Perceptual codebook for bert pre-training of vision transformers. arXiv preprint arXiv:2111.12710","author":"Dong Xiaoyi","year":"2021","unstructured":"Xiaoyi Dong , Jianmin Bao , Ting Zhang , Dongdong Chen , Weiming Zhang , Lu Yuan , Dong Chen , Fang Wen , and Nenghai Yu . Peco: Perceptual codebook for bert pre-training of vision transformers. arXiv preprint arXiv:2111.12710 , 2021 . Xiaoyi Dong, Jianmin Bao, Ting Zhang, Dongdong Chen, Weiming Zhang, Lu Yuan, Dong Chen, Fang Wen, and Nenghai Yu. Peco: Perceptual codebook for bert pre-training of vision transformers. arXiv preprint arXiv:2111.12710, 2021."},{"key":"e_1_3_2_1_13_1","first-page":"255","volume-title":"2009 IEEE conference on computer vision and pattern recognition","author":"Deng Jia","unstructured":"Jia Deng , Wei Dong , Richard Socher , Li-Jia Li , Kai Li , and Li Fei-Fei . Imagenet : A large-scale hierarchicalimage database . In 2009 IEEE conference on computer vision and pattern recognition , pages 248\u2013 255 . 2009. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchicalimage database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248\u2013255. 2009."},{"key":"e_1_3_2_1_14_1","volume-title":"BEiT: BERT pre-training of image transformers","author":"Bao Hangbo","year":"2021","unstructured":"Hangbo Bao , Li Dong , and Furu Wei . BEiT: BERT pre-training of image transformers . 2021 . Hangbo Bao, Li Dong, and Furu Wei. BEiT: BERT pre-training of image transformers. 2021."},{"key":"e_1_3_2_1_15_1","volume-title":"An image is worth16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929","author":"Dosovitskiy Alexey","year":"2020","unstructured":"Alexey Dosovitskiy , Lucas Beyer , Alexander Kolesnikov , Dirk Weissenborn , Xiaohua Zhai , Thomas Unterthiner , Mostafa Dehghani , Matthias Minderer , Georg Heigold , Sylvain Gelly , An image is worth16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 , 2020 . Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, An image is worth16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020."},{"key":"e_1_3_2_1_16_1","volume-title":"An empirical study of training self-supervised vision transformers. arXiv preprint arXiv:2104.02057","author":"Chen Xinlei","year":"2021","unstructured":"Xinlei Chen , Saining Xie , and Kaiming He . An empirical study of training self-supervised vision transformers. arXiv preprint arXiv:2104.02057 , 2021 . Xinlei Chen, Saining Xie, and Kaiming He. An empirical study of training self-supervised vision transformers. arXiv preprint arXiv:2104.02057, 2021."},{"key":"e_1_3_2_1_17_1","volume-title":"Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 , 2018 . Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018."},{"key":"e_1_3_2_1_18_1","first-page":"1986","volume-title":"Yuille","author":"R.","year":"2014","unstructured":"Chen, X., Mottaghi, R., Liu, X., Fidler, S., Urtasun, R. , Yuille , A.L. : Detect what you can: Detecting and representing objects using holistic models and body parts. In : CVPR. pp. 1979\u2013 1986 ( 2014 ) Chen, X., Mottaghi, R., Liu, X., Fidler, S., Urtasun, R., Yuille, A.L.: Detect what you can: Detecting and representing objects using holistic models and body parts. In: CVPR. pp. 1979\u20131986 (2014)"},{"issue":"9","key":"e_1_3_2_1_19_1","first-page":"1627","volume":"32","author":"Girshick P.F.","year":"2009","unstructured":"Felzenszwalb, P.F. , Girshick , R.B. , McAllester , D., Ramanan, D.: Object detection with discriminatively trained part-based models. TPAMI 32 ( 9 ), 1627 \u2013 1645 ( 2009 ) Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. TPAMI 32(9), 1627\u20131645 (2009)","journal-title":"TPAMI"},{"issue":"1","key":"e_1_3_2_1_20_1","first-page":"67","volume":"100","author":"Elschlager M.A.","year":"1973","unstructured":"Fischler, M.A. , Elschlager , R.A.: The representation and matching of pictorial structures. TC 100 ( 1 ), 67 \u2013 92 ( 1973 ) Fischler, M.A., Elschlager, R.A.: The representation and matching of pictorial structures. TC 100(1), 67\u201392 (1973)","journal-title":"TC"},{"unstructured":"Hinton G.: How to represent part-whole hierarchies in a neural network. arXiv preprint arXiv:2102.12627 (2021)  Hinton G.: How to represent part-whole hierarchies in a neural network. arXiv preprint arXiv:2102.12627 (2021)","key":"e_1_3_2_1_21_1"},{"key":"e_1_3_2_1_22_1","volume-title":"Yuille","author":"Liu J.N.","year":"2021","unstructured":"He, J., Yang, S., Yang, S., Kortylewski, A., Yuan, X., Chen, J.N. , Liu , S., Yang, C. , Yuille , A. : Partimagenet: A large, high-quality dataset of parts. arXiv preprint arXiv:2112.00933 ( 2021 ) He, J., Yang, S., Yang, S., Kortylewski, A., Yuan, X., Chen, J.N., Liu, S., Yang, C., Yuille, A.: Partimagenet: A large, high-quality dataset of parts. arXiv preprint arXiv:2112.00933 (2021)"},{"key":"e_1_3_2_1_23_1","first-page":"6008","volume-title":"Advances in neural information processing systems","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N Gomez , \u0141ukasz Kaiser , and Illia Polosukhin . Attention is all you need . In Advances in neural information processing systems , pages 5998\u2013 6008 , 2017 . Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, pages 5998\u20136008, 2017."},{"key":"e_1_3_2_1_24_1","first-page":"10357","volume-title":"International Conference on Machine Learning","author":"Touvron Hugo","unstructured":"Hugo Touvron , Matthieu Cord , Matthijs Douze , Francisco Massa , Alexandre Sablayrolles , and Herv\u00e9 J\u00e9gou . Training data-efficient image transformers & distillation through attention . In International Conference on Machine Learning , pages 10347\u2013 10357 . PMLR, 2021. Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Herv\u00e9 J\u00e9gou. Training data-efficient image transformers & distillation through attention. In International Conference on Machine Learning, pages 10347\u201310357. PMLR, 2021."},{"key":"e_1_3_2_1_25_1","volume-title":"Andrew Zhai, and Dmitry Kislyuk. Toward transformerbased object detection. arXiv preprint arXiv:2012.09958","author":"Beal Josh","year":"2020","unstructured":"Josh Beal , Eric Kim , Eric Tzeng , Dong Huk Park , Andrew Zhai, and Dmitry Kislyuk. Toward transformerbased object detection. arXiv preprint arXiv:2012.09958 , 2020 . Josh Beal, Eric Kim, Eric Tzeng, Dong Huk Park, Andrew Zhai, and Dmitry Kislyuk. Toward transformerbased object detection. arXiv preprint arXiv:2012.09958, 2020."},{"key":"e_1_3_2_1_26_1","volume-title":"Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030","author":"Liu Ze","year":"2021","unstructured":"Ze Liu , Yutong Lin , Yue Cao , Han Hu , Yixuan Wei , Zheng Zhang , Stephen Lin , and Baining Guo . Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030 , 2021 . Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030, 2021."},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_27_1","DOI":"10.1109\/CVPR.2017.544"},{"key":"e_1_3_2_1_28_1","volume-title":"Emerging properties in self-supervised vision transformers. arXiv preprint arXiv:2104.14294","author":"Caron Mathilde","year":"2021","unstructured":"Mathilde Caron , Hugo Touvron , Ishan Misra , Herv\u00e9 J\u00e9gou , Julien Mairal , Piotr Bojanowski , and Armand Joulin . Emerging properties in self-supervised vision transformers. arXiv preprint arXiv:2104.14294 , 2021 . Mathilde Caron, Hugo Touvron, Ishan Misra, Herv\u00e9 J\u00e9gou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. arXiv preprint arXiv:2104.14294, 2021."},{"key":"e_1_3_2_1_29_1","volume-title":"AAAI","author":"Paul Sayak","year":"2022","unstructured":"Sayak Paul and Pin-Yu Chen . Vision transformers are robust learners . AAAI , 2022 . Sayak Paul and Pin-Yu Chen. Vision transformers are robust learners. AAAI, 2022."},{"key":"e_1_3_2_1_30_1","first-page":"34","year":"2021","unstructured":"Maithra Raghu, Thomas Unterthiner, Simon Kornblith, Chiyuan Zhang, and Alexey Dosovitskiy. Do vision transformers see like convolutional neural networks? Advances in Neural Information Processing Systems , 34 , 2021 . Maithra Raghu, Thomas Unterthiner, Simon Kornblith, Chiyuan Zhang, and Alexey Dosovitskiy. Do vision transformers see like convolutional neural networks? Advances in Neural Information Processing Systems, 34, 2021.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_31_1","volume-title":"ECCV","author":"Xiao Tete","year":"2018","unstructured":"Tete Xiao , Yingcheng Liu , Bolei Zhou , Yuning Jiang , and Jian Sun . Unified perceptual parsing for scene understanding . In ECCV , 2018 . Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, and Jian Sun. Unified perceptual parsing for scene understanding. In ECCV, 2018."},{"key":"e_1_3_2_1_32_1","volume-title":"ECCV","author":"Lin Tsung-Yi","year":"2014","unstructured":"Tsung-Yi Lin , Michael Maire , Serge Belongie , James Hays , Pietro Perona , Deva Ramanan , Piotr Doll\u00e1r , and C Lawrence Zitnick . Microsoft coco : Common objects in context . In ECCV , 2014 . Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll\u00e1r, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In ECCV, 2014."}],"event":{"acronym":"ICVIP 2022","name":"ICVIP 2022: 2022 The 6th International Conference on Video and Image Processing","location":"Shanghai China"},"container-title":["2022 The 6th International Conference on Video and Image Processing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3579109.3579133","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3579109.3579133","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:38:06Z","timestamp":1750178286000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3579109.3579133"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,23]]},"references-count":32,"alternative-id":["10.1145\/3579109.3579133","10.1145\/3579109"],"URL":"https:\/\/doi.org\/10.1145\/3579109.3579133","relation":{},"subject":[],"published":{"date-parts":[[2022,12,23]]},"assertion":[{"value":"2023-03-14","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}