{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,23]],"date-time":"2025-12-23T10:14:43Z","timestamp":1766484883740,"version":"3.48.0"},"reference-count":39,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,12,23]],"date-time":"2025-12-23T00:00:00Z","timestamp":1766448000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,12,23]],"date-time":"2025-12-23T00:00:00Z","timestamp":1766448000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Auton. Intell. Syst."],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Vision Transformers (ViTs) have achieved state-of-the-art performance on various computer vision tasks. However these models are memory-consuming and computation-intensive, making their deployment and efficient inference on edge devices challenging. Model quantization is a promising approach to reduce model complexity. Prior works have explored tailored quantization algorithms for ViTs but unfortunately retained floating-point (FP) scaling factors, which not only yield non-negligible re-quantization overhead, but also hinder the quantized models to perform efficient integer-only inference. In this paper, we propose H-ViT, a dedicated post-training quantization scheme (e.g., symmetric uniform quantization and layer-wise quantization for both weights and part of activations) to effectively quantize ViTs with fewer Power-of-Two (PoT) scaling factors, thus minimizing the re-quantization overhead and memory consumption. In addition, observing serious inter-channel variation in LayerNorm inputs and outputs, we propose Power-of-Two quantization (PTQ), a systematic method to reducing the performance degradation without hyper-parameters. Extensive experiments are conducted on multiple vision tasks with different model variants, proving that H-ViT offers comparable(or even slightly higher) INT8 quantization performance with PoT scaling factors when compared to the counterpart with floating-point scaling factors. For instance, we reach 78.43 top-1 accuracy with DeiT-S on ImageNet, 51.6 box AP and 44.8 mask AP with Cascade Mask R-CNN (Swin-B) on COCO.<\/jats:p>","DOI":"10.1007\/s43684-025-00121-0","type":"journal-article","created":{"date-parts":[[2025,12,23]],"date-time":"2025-12-23T10:12:05Z","timestamp":1766484725000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["H-ViT: hardware-friendly post-training quantization for efficient vision transformer inference"],"prefix":"10.1007","volume":"5","author":[{"given":"Jing","family":"Liu","sequence":"first","affiliation":[]},{"given":"Jiaqi","family":"Lai","sequence":"additional","affiliation":[]},{"given":"Xiaodong","family":"Deng","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1342-4094","authenticated-orcid":false,"given":"Caigui","family":"Jiang","sequence":"additional","affiliation":[]},{"given":"Nanning","family":"Zheng","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,12,23]]},"reference":[{"key":"121_CR1","volume-title":"International Conference on Learning Representations","author":"A. Dosovitskiy","year":"2021","unstructured":"A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby, An image is worth 16x16 words: transformers for image recognition at scale, in International Conference on Learning Representations (2021). https:\/\/openreview.net\/forum?id=YicbFdNTTy"},{"key":"121_CR2","volume-title":"2021 IEEE\/CVF International Conference on Computer Vision (ICCV)","author":"Z. Liu","year":"2021","unstructured":"Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer: hierarchical vision transformer using shifted windows, in 2021 IEEE\/CVF International Conference on Computer Vision (ICCV) (2021)"},{"key":"121_CR3","series-title":"Lecture Notes in Computer Science","doi-asserted-by":"publisher","first-page":"213","DOI":"10.1007\/978-3-030-58452-8_13","volume-title":"Computer Vision \u2013 ECCV 2020","author":"N. Carion","year":"2020","unstructured":"N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, S. Zagoruyko, End-to-end object detection with transformers, in Computer Vision \u2013 ECCV 2020. Lecture Notes in Computer Science (2020), pp. 213\u2013229"},{"key":"121_CR4","unstructured":"X. Zhu, W. Su, L. Lu, B. Li, X. Wang, J. Dai, Deformable detr: deformable transformers for end-to-end object detection. arXiv: Computer Vision and Pattern Recognition 2020)"},{"key":"121_CR5","volume-title":"2021 IEEE\/CVF International Conference on Computer Vision (ICCV)","author":"R. Strudel","year":"2021","unstructured":"R. Strudel, R. Garcia, I. Laptev, C. Schmid, Segmenter: transformer for semantic segmentation, in 2021 IEEE\/CVF International Conference on Computer Vision (ICCV) (2021)"},{"issue":"1","key":"121_CR6","doi-asserted-by":"publisher","first-page":"87","DOI":"10.1109\/TPAMI.2022.3152247","volume":"45","author":"K. Han","year":"2023","unstructured":"K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, Z. Liu, Y. Tang, A. Xiao, C. Xu, Y. Xu, Z. Yang, Y. Zhang, D. Tao, A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 87\u2013110 (2023)","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"121_CR7","doi-asserted-by":"publisher","first-page":"3668","DOI":"10.1109\/CVPRW56347.2022.00411","volume-title":"2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","author":"Z. Hou","year":"2022","unstructured":"Z. Hou, S.-Y. Kung, Multi-dimensional vision transformer compression via dependency guided Gaussian process search, in 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2022), pp. 3668\u20133677"},{"key":"121_CR8","unstructured":"Y. Tang, K. Han, Y. Wang, C. Xu, J. Guo, C. Xu, D. Tao, Patch Slimming for Efficient Vision Transformers (2021) arXiv e-prints, 2106\u201302852"},{"key":"121_CR9","unstructured":"R. Krishnamoorthi, Quantizing deep convolutional networks for efficient inference: a whitepaper (2018). arXiv preprint arXiv:1806.08342"},{"key":"121_CR10","unstructured":"J. Choi, Z. Wang, S. Venkataramani, P. I-Jen Chuang, V. Srinivasan, K. Gopalakrishnan, PACT: parameterized Clipping Activation for Quantized Neural Networks (2018) arXiv e-prints, 1805\u201306085"},{"key":"121_CR11","unstructured":"S.K. Esser, J.L. McKinstry, D. Bablani, R. Appuswamy, D.S. Modha, Learned step size quantization (2019). arXiv preprint arXiv:1902.08153"},{"key":"121_CR12","doi-asserted-by":"publisher","first-page":"67","DOI":"10.1016\/j.neucom.2022.09.076","volume":"511","author":"Z. Li","year":"2022","unstructured":"Z. Li, L. Ma, X. Long, J. Xiao, Q. Gu, Dual-discriminator adversarial framework for data-free quantization. Neurocomputing 511, 67\u201377 (2022)","journal-title":"Neurocomputing"},{"key":"121_CR13","unstructured":"X. Wei, R. Gong, Y. Li, X. Liu, F. Yu, Qdrop: randomly dropping quantization for extremely low-bit post-training quantization (2022). arXiv preprint arXiv:2203.05740"},{"key":"121_CR14","unstructured":"Y. Li, R. Gong, X. Tan, Y. Yang, P. Hu, Q. Zhang, F. Yu, W. Wang, S. Gu, Brecq: pushing the limit of post-training quantization by block reconstruction (2021). arXiv preprint arXiv:2102.05426"},{"key":"121_CR15","first-page":"7197","volume-title":"International Conference on Machine Learning","author":"M. Nagel","year":"2020","unstructured":"M. Nagel, R.A. Amjad, M. Van Baalen, C. Louizos, T. Blankevoort, Up or down? Adaptive rounding for post-training quantization, in International Conference on Machine Learning (2020), pp. 7197\u20137206"},{"key":"121_CR16","first-page":"9847","volume-title":"International Conference on Machine Learning","author":"P. Wang","year":"2020","unstructured":"P. Wang, Q. Chen, X. He, J. Cheng, Towards accurate post-training network quantization via bit-split and stitching, in International Conference on Machine Learning (2020), pp. 9847\u20139856"},{"key":"121_CR17","volume-title":"2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"B. Jacob","year":"2018","unstructured":"B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, D. Kalenichenko, Quantization and training of neural networks for efficient integer-arithmetic-only inference, in 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (2018)"},{"key":"121_CR18","unstructured":"Z. Yao, Z. Dong, Z. Zheng, A. Gholami, J. Yu, E. Tan, L. Wang, Q. Huang, Y. Wang, M.W. Mahoney, K. Keutzer, HAWQV3: dyadic Neural Network Quantization (2020) arXiv e-prints, 2011\u201310680"},{"key":"121_CR19","first-page":"17227","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Z. Li","year":"2023","unstructured":"Z. Li, J. Xiao, L. Yang, Q. Gu, Repq-vit: scale reparameterization for post-training quantization of vision transformers, in Proceedings of the IEEE\/CVF International Conference on Computer Vision (2023), pp. 17227\u201317236"},{"key":"121_CR20","first-page":"1173","volume-title":"Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22","author":"Y. Lin","year":"2022","unstructured":"Y. Lin, T. Zhang, P. Sun, Z. Li, S. Zhou, Fq-vit: post-training quantization for fully quantized vision transformer, in Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22 (2022), pp. 1173\u20131179"},{"key":"121_CR21","doi-asserted-by":"crossref","unstructured":"Z. Yuan, C. Xue, Y. Chen, Q. Wu, G. Sun, Ptq4vit: post-training quantization framework for vision transformers (2022). arXiv preprint arXiv:2111.12293","DOI":"10.1007\/978-3-031-19775-8_12"},{"key":"121_CR22","unstructured":"Z. Liu, Y. Wang, K. Han, S. Ma, W. Gao, Post-Training Quantization for Vision Transformer. arXiv e-prints, 2106\u201314156 (2021)"},{"key":"121_CR23","first-page":"28092","volume":"34","author":"Z. Liu","year":"2021","unstructured":"Z. Liu, Y. Wang, K. Han, W. Zhang, S. Ma, W. Gao, Post-training quantization for vision transformer. Adv. Neural Inf. Process. Syst. 34, 28092\u201328103 (2021)","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"121_CR24","first-page":"20321","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Y. Liu","year":"2023","unstructured":"Y. Liu, H. Yang, Z. Dong, K. Keutzer, L. Du, S. Zhang, Noisyquant: noisy bias-enhanced post-training activation quantization for vision transformers, in Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (2023), pp. 20321\u201320330"},{"issue":"12","key":"121_CR25","doi-asserted-by":"publisher","first-page":"5289","DOI":"10.1109\/TCSI.2023.3312775","volume":"70","author":"M. Huang","year":"2023","unstructured":"M. Huang, J. Luo, C. Ding, Z. Wei, S. Huang, H. Yu, An integer-only and group-vector systolic accelerator for efficiently mapping vision transformer on edge. IEEE Trans. Circuits Syst. I, Regul. Pap. 70(12), 5289\u20135301 (2023)","journal-title":"IEEE Trans. Circuits Syst. I, Regul. Pap."},{"key":"121_CR26","doi-asserted-by":"crossref","unstructured":"H. Yao, P. Li, J. Cao, X. Liu, C. Xie, B. Wang, Rapq: rescuing accuracy for power-of-two low-bit post-training quantization (2022). arXiv preprint arXiv:2204.12322","DOI":"10.24963\/ijcai.2022\/219"},{"key":"121_CR27","first-page":"112","volume":"2","author":"S. Jain","year":"2020","unstructured":"S. Jain, A. Gural, M. Wu, C. Dick, Trained quantization thresholds for accurate and efficient fixed-point inference of deep neural networks. Proc. Mach. Learn. Syst. 2, 112\u2013128 (2020)","journal-title":"Proc. Mach. Learn. Syst."},{"key":"121_CR28","first-page":"17065","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Z. Li","year":"2023","unstructured":"Z. Li, Q. Gu, I-vit: integer-only quantization for efficient vision transformer inference, in Proceedings of the IEEE\/CVF International Conference on Computer Vision (2023), pp. 17065\u201317075"},{"key":"121_CR29","first-page":"2704","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"B. Jacob","year":"2018","unstructured":"B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, D. Kalenichenko, Quantization and training of neural networks for efficient integer-arithmetic-only inference, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018), pp. 2704\u20132713"},{"issue":"9","key":"121_CR30","doi-asserted-by":"publisher","first-page":"1704","DOI":"10.1109\/TVLSI.2024.3422684","volume":"32","author":"H. Shi","year":"2024","unstructured":"H. Shi, X. Cheng, W. Mao, Z. Wang, P2-vit: power-of-two post-training quantization and acceleration for fully quantized vision transformer. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 32(9), 1704\u20131717 (2024)","journal-title":"IEEE Trans. Very Large Scale Integr. (VLSI) Syst."},{"key":"121_CR31","volume-title":"International Conference on Machine Learning","author":"H. Touvron","year":"2020","unstructured":"H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, H. J\u2019egou, Training data-efficient image transformers & distillation through attention, in International Conference on Machine Learning (2020)"},{"issue":"12","key":"121_CR32","doi-asserted-by":"publisher","first-page":"17227","DOI":"10.1109\/TNNLS.2023.3301007","volume":"35","author":"Z. Li","year":"2024","unstructured":"Z. Li, M. Chen, J. Xiao, Q. Gu, Psaq-vit v2: toward accurate and general data-free quantization for vision transformers. IEEE Trans. Neural Netw. Learn. Syst. 35(12), 17227\u201317238 (2024)","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"121_CR33","doi-asserted-by":"publisher","first-page":"3009","DOI":"10.1109\/ICCVW.2019.00363","volume-title":"2019 IEEE\/CVF International Conference on Computer Vision Workshop (ICCVW)","author":"Y. Choukroun","year":"2019","unstructured":"Y. Choukroun, E. Kravchik, F. Yang, P. Kisilev, Low-bit quantization of neural networks for efficient inference, in 2019 IEEE\/CVF International Conference on Computer Vision Workshop (ICCVW) (2019), pp. 3009\u20133018"},{"key":"121_CR34","doi-asserted-by":"publisher","first-page":"248","DOI":"10.1109\/CVPR.2009.5206848","volume-title":"2009 IEEE Conference on Computer Vision and Pattern Recognition","author":"J. Deng","year":"2009","unstructured":"J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, Imagenet: a large-scale hierarchical image database, in 2009 IEEE Conference on Computer Vision and Pattern Recognition (2009), pp. 248\u2013255"},{"key":"121_CR35","first-page":"2810","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"R. Li","year":"2019","unstructured":"R. Li, Y. Wang, F. Liang, H. Qin, J. Yan, R. Fan, Fully quantized network for object detection, in Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (2019), pp. 2810\u20132819"},{"key":"121_CR36","unstructured":"D. Wu, Q. Tang, Y. Zhao, M. Zhang, Y. Fu, D. Zhang, Easyquant: post-training quantization via scale optimization (2020). arXiv preprint arXiv:2006.16669"},{"key":"121_CR37","doi-asserted-by":"publisher","first-page":"5380","DOI":"10.1145\/3503161.3547826","volume-title":"Proceedings of the 30th ACM International Conference on Multimedia","author":"Y. Ding","year":"2022","unstructured":"Y. Ding, H. Qin, Q. Yan, Z. Chai, J. Liu, X. Wei, X. Liu, Towards accurate post-training quantization for vision transformer, in Proceedings of the 30th ACM International Conference on Multimedia (2022), pp. 5380\u20135388"},{"key":"121_CR38","doi-asserted-by":"publisher","first-page":"740","DOI":"10.1007\/978-3-319-10602-1_48","volume-title":"Computer Vision\u2013ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V","author":"T.-Y. Lin","year":"2014","unstructured":"T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll\u00e1r, C.L. Zitnick, Microsoft coco: common objects in context, in Computer Vision\u2013ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V, vol.\u00a013 (2014), pp. 740\u2013755"},{"key":"121_CR39","unstructured":"AMD\/Xilinx, Versal acap ai engine architecture manual (am009) (2021). https:\/\/docs.amd.com\/r\/en-US\/am009-versal-ai-engine\/Revision-History"}],"container-title":["Autonomous Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s43684-025-00121-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s43684-025-00121-0","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s43684-025-00121-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,23]],"date-time":"2025-12-23T10:12:10Z","timestamp":1766484730000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s43684-025-00121-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,23]]},"references-count":39,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["121"],"URL":"https:\/\/doi.org\/10.1007\/s43684-025-00121-0","relation":{},"ISSN":["2730-616X"],"issn-type":[{"value":"2730-616X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12,23]]},"assertion":[{"value":"25 August 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 November 2025","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 December 2025","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 December 2025","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"32"}}