{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:48:24Z","timestamp":1750308504802,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":19,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,3,25]],"date-time":"2022-03-25T00:00:00Z","timestamp":1648166400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,3,25]]},"DOI":"10.1145\/3546607.3546616","type":"proceedings-article","created":{"date-parts":[[2022,8,25]],"date-time":"2022-08-25T13:40:48Z","timestamp":1661434848000},"page":"56-63","source":"Crossref","is-referenced-by-count":2,"title":["The Application of Vision Transformer in Image Classification"],"prefix":"10.1145","author":[{"given":"Zhixuan","family":"He","sequence":"first","affiliation":[{"name":"University of Hong Kong, China"}]}],"member":"320","published-online":{"date-parts":[[2022,8,25]]},"reference":[{"doi-asserted-by":"publisher","key":"e_1_3_2_1_1_1","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_2_1","volume-title":"Springer","author":"Fukushima Kunihiko","year":"1982","unstructured":"Kunihiko Fukushima and Sei Miyake . Neocognitron : A self-organizing neural network model for a mechanism of visual pattern recognition. In Competition and cooperation in neural nets, pages 267\u2013285 . Springer , 1982 . Kunihiko Fukushima and Sei Miyake. Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition. In Competition and cooperation in neural nets, pages 267\u2013285. Springer, 1982."},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_3_1","DOI":"10.1162\/neco.1989.1.4.541"},{"key":"e_1_3_2_1_4_1","first-page":"6008","volume-title":"Advances in neural information processing systems","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N Gomez , \u0141ukasz Kaiser , and Illia Polosukhin . Attention is all you need . In Advances in neural information processing systems , pages 5998\u2013 6008 , 2017 . Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, pages 5998\u20136008, 2017."},{"key":"e_1_3_2_1_5_1","volume-title":"An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929","author":"Dosovitskiy Alexey","year":"2020","unstructured":"Alexey Dosovitskiy , Lucas Beyer , Alexander Kolesnikov , Dirk Weissenborn , Xiaohua Zhai , Thomas Unterthiner , Mostafa Dehghani , Matthias Minderer , Georg Heigold , Sylvain Gelly , An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 , 2020 . Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020."},{"key":"e_1_3_2_1_6_1","volume-title":"Crossvit: Cross-attention multi-scale vision transformer for image classification. arXiv preprint arXiv:2103.14899","author":"Chen Chun-Fu","year":"2021","unstructured":"Chun-Fu Chen , Quanfu Fan , and Rameswar Panda . Crossvit: Cross-attention multi-scale vision transformer for image classification. arXiv preprint arXiv:2103.14899 , 2021 . Chun-Fu Chen, Quanfu Fan, and Rameswar Panda. Crossvit: Cross-attention multi-scale vision transformer for image classification. arXiv preprint arXiv:2103.14899, 2021."},{"key":"e_1_3_2_1_7_1","volume-title":"Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. arXiv preprint arXiv:2102.12122","author":"Wang Wenhai","year":"2021","unstructured":"Wenhai Wang , Enze Xie , Xiang Li , Deng-Ping Fan , Kaitao Song , Ding Liang , Tong Lu , Ping Luo , and Ling Shao . Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. arXiv preprint arXiv:2102.12122 , 2021 . Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. arXiv preprint arXiv:2102.12122, 2021."},{"key":"e_1_3_2_1_8_1","volume-title":"Swin transformer: Hierarchical vision transformer using shifted windows","author":"Liu Z.","year":"2021","unstructured":"Z. Liu , Y. Lin , Y. Cao , H. Hu , Y. Wei , Z. Zhang , S. Lin , and B. Guo . Swin transformer: Hierarchical vision transformer using shifted windows . 2021 . Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo. Swin transformer: Hierarchical vision transformer using shifted windows. 2021."},{"key":"e_1_3_2_1_9_1","first-page":"3649","volume-title":"2012 IEEE conference on computer vision and pattern recognition","author":"Ciregan Dan","unstructured":"Dan Ciregan , Ueli Meier , and J\u00fcrgen Schmidhuber . Multi-column deep neural networks for image classification . In 2012 IEEE conference on computer vision and pattern recognition , pages 3642\u2013 3649 . IEEE, 2012. Dan Ciregan, Ueli Meier, and J\u00fcrgen Schmidhuber. Multi-column deep neural networks for image classification. In 2012 IEEE conference on computer vision and pattern recognition, pages 3642\u20133649. IEEE, 2012."},{"key":"e_1_3_2_1_10_1","volume-title":"Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25:1097\u20131105","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky , Ilya Sutskever , and Geoffrey E Hinton . Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25:1097\u20131105 , 2012 . Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25:1097\u20131105, 2012."},{"key":"e_1_3_2_1_11_1","volume-title":"Learning multiple layers of features from tiny images","author":"Krizhevsky Alex","year":"2009","unstructured":"Alex Krizhevsky , Geoffrey Hinton , Learning multiple layers of features from tiny images . 2009 . Alex Krizhevsky, Geoffrey Hinton, Learning multiple layers of features from tiny images. 2009."},{"key":"e_1_3_2_1_12_1","first-page":"6","volume-title":"2017 International Conference on Engineering and Technology (ICET)","author":"Albawi Saad","unstructured":"Saad Albawi , Tareq Abed Mohammed , and Saad Al-Zawi . Understanding of a convolutional neural network . In 2017 International Conference on Engineering and Technology (ICET) , pages 1\u2013 6 . Ieee, 2017. Saad Albawi, Tareq Abed Mohammed, and Saad Al-Zawi. Understanding of a convolutional neural network. In 2017 International Conference on Engineering and Technology (ICET), pages 1\u20136. Ieee, 2017."},{"key":"e_1_3_2_1_13_1","volume-title":"Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101","author":"Loshchilov Ilya","year":"2017","unstructured":"Ilya Loshchilov and Frank Hutter . Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 , 2017 . Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017."},{"key":"e_1_3_2_1_14_1","first-page":"1339","volume-title":"2019 IEEE Intelligent Vehicles Symposium (IV)","author":"Meletis Panagiotis","unstructured":"Panagiotis Meletis and Gijs Dubbelman . On boosting semantic street scene segmentation with weak supervision . In 2019 IEEE Intelligent Vehicles Symposium (IV) , pages 1334\u2013 1339 . IEEE, 2019. Panagiotis Meletis and Gijs Dubbelman. On boosting semantic street scene segmentation with weak supervision. In 2019 IEEE Intelligent Vehicles Symposium (IV), pages 1334\u20131339. IEEE, 2019."},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_15_1","DOI":"10.5555\/3327546.3327555"},{"key":"e_1_3_2_1_16_1","volume-title":"Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. JAX: composable transforma- tions of Python+NumPy programs","author":"Bradbury James","year":"2018","unstructured":"James Bradbury , Roy Frostig , Peter Hawkins , Matthew James Johnson , Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. JAX: composable transforma- tions of Python+NumPy programs , 2018 . James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. JAX: composable transforma- tions of Python+NumPy programs, 2018."},{"key":"e_1_3_2_1_17_1","volume-title":"TensorFlow: Large-scale machine learning on heterogeneous systems","author":"Abadi Mart\u00edn","year":"2015","unstructured":"Mart\u00edn Abadi , Ashish Agarwal , Paul Barham , Eugene Brevdo , Zhifeng Chen , Craig Citro , Greg S. Corrado , Andy Davis , Jeffrey Dean , Matthieu Devin , Sanjay Ghemawat , Ian Goodfellow , Andrew Harp , Geoffrey Irving , Michael Isard , Yangqing Jia , Rafal Jozefowicz , Lukasz Kaiser , Manjunath Kudlur , Josh Levenberg , Dandelion Man\u00e9 , Rajat Monga , Sherry Moore , Derek Murray , Chris Olah , Mike Schuster , Jonathon Shlens , Benoit Steiner , Ilya Sutskever , Kunal Talwar , Paul Tucker , Vincent Vanhoucke , Vijay Vasudevan , Fernanda Vi\u00e9gas , Oriol Vinyals , Pete Warden , Martin Wattenberg , Martin Wicke , Yuan Yu , and Xiaoqiang Zheng . TensorFlow: Large-scale machine learning on heterogeneous systems , 2015 . Software available from tensorflow.org. Mart\u00edn Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Man\u00e9, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Vi\u00e9gas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org."},{"key":"e_1_3_2_1_18_1","volume-title":"https:\/\/keras.io","author":"Keras Fran\u00e7ois Chollet","year":"2015","unstructured":"Fran\u00e7ois Chollet Keras . https:\/\/keras.io , 2015 . Fran\u00e7ois Chollet Keras. https:\/\/keras.io, 2015."},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_19_1","DOI":"10.1109\/CVPR.2015.7298594"}],"event":{"acronym":"ICVARS 2022","name":"ICVARS 2022: 2022 the 6th International Conference on Virtual and Augmented Reality Simulations","location":"Brisbane QLD Australia"},"container-title":["2022 the 6th International Conference on Virtual and Augmented Reality Simulations"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3546607.3546616","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3546607.3546616","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T18:44:02Z","timestamp":1750272242000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3546607.3546616"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,25]]},"references-count":19,"alternative-id":["10.1145\/3546607.3546616","10.1145\/3546607"],"URL":"https:\/\/doi.org\/10.1145\/3546607.3546616","relation":{},"subject":[],"published":{"date-parts":[[2022,3,25]]}}}