{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,9]],"date-time":"2026-04-09T11:28:44Z","timestamp":1775734124440,"version":"3.50.1"},"reference-count":47,"publisher":"MDPI AG","issue":"14","license":[{"start":{"date-parts":[[2022,7,6]],"date-time":"2022-07-06T00:00:00Z","timestamp":1657065600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Infrared small target detection occupies an important position in the infrared search and track system. The most common size of infrared images has developed to 640\u00d7512. The field-of-view (FOV) also increases significantly. As the result, there is more interference that hinders the detection of small targets in the image. However, the traditional model-driven methods do not have the capability of feature learning, resulting in poor adaptability to various scenes. Owing to the locality of convolution kernels, recent convolutional neural networks (CNN) cannot model the long-range dependency in the image to suppress false alarms. In this paper, we propose a hierarchical vision transformer-based method for infrared small target detection in larger size and FOV images of 640\u00d7512. Specifically, we design a hierarchical overlapped small patch transformer (HOSPT), instead of the CNN, to encode multi-scale features from the single-frame image. For the decoder, a top-down feature aggregation module (TFAM) is adopted to fuse features from adjacent scales. Furthermore, after analyzing existing loss functions, a simple yet effective combination is exploited to optimize the network convergence. Compared to other state-of-the-art methods, the normalized intersection-over-union (nIoU) on our IRST640 dataset and public SIRST dataset reaches 0.856 and 0.758. The detailed ablation experiments are conducted to validate the effectiveness and reasonability of each component in the method.<\/jats:p>","DOI":"10.3390\/rs14143258","type":"journal-article","created":{"date-parts":[[2022,7,6]],"date-time":"2022-07-06T21:15:52Z","timestamp":1657142152000},"page":"3258","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":64,"title":["IRSTFormer: A Hierarchical Vision Transformer for Infrared Small Target Detection"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5646-9970","authenticated-orcid":false,"given":"Gao","family":"Chen","sequence":"first","affiliation":[{"name":"National Key Laboratory of Science and Technology on Automatic Target Recognition, Collage of Electronic Science and Technology, National University of Defense Technolody, Changsha 410073, China"}]},{"given":"Weihua","family":"Wang","sequence":"additional","affiliation":[{"name":"National Key Laboratory of Science and Technology on Automatic Target Recognition, Collage of Electronic Science and Technology, National University of Defense Technolody, Changsha 410073, China"}]},{"given":"Sirui","family":"Tan","sequence":"additional","affiliation":[{"name":"National Key Laboratory of Science and Technology on Automatic Target Recognition, Collage of Electronic Science and Technology, National University of Defense Technolody, Changsha 410073, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,7,6]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Tartakovsky, A.G., Kligys, S., and Petrov, A. (1999, January 4). Adaptive sequential algorithms for detecting targets in a heavy IR clutter. Proceedings of the Signal and Data Processing of Small Targets 1999, Denver, CO, USA.","DOI":"10.1117\/12.364013"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"5039","DOI":"10.1109\/JSTARS.2018.2877501","article-title":"Robust infrared small target detection using multiscale gray and variance difference measures","volume":"11","author":"Gao","year":"2018","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"103657","DOI":"10.1016\/j.infrared.2021.103657","article-title":"Infrared maritime dim small target detection based on spatiotemporal cues and directional morphological filtering","volume":"115","author":"Li","year":"2021","journal-title":"Infrared Phys. Technol."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Tom, V.T., Peli, T., Leung, M., and Bondaryk, J.E. (1993, January 12\u201314). Morphology-based algorithm for point target detection in infrared backgrounds. Proceedings of the Signal and Data Processing of Small Targets, Orlando, FL, USA.","DOI":"10.1117\/12.157758"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Deshpande, S.D., Er, M.H., Venkateswarlu, R., and Chan, P. (1999, January 20\u201322). Max-mean and max-median filters for detection of small targets. Proceedings of the Signal and Data Processing of Small Targets, Denver, CO, USA.","DOI":"10.1117\/12.364049"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1822","DOI":"10.1109\/LGRS.2019.2954578","article-title":"A local contrast method for infrared small-target detection utilizing a tri-layer window","volume":"17","author":"Han","year":"2019","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"612","DOI":"10.1109\/LGRS.2018.2790909","article-title":"Infrared small target detection utilizing the multiscale relative local contrast measure","volume":"15","author":"Han","year":"2018","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1016\/j.infrared.2019.06.003","article-title":"Small infrared target detection using absolute average difference weighted by cumulative directional derivatives","volume":"101","author":"Aghaziyarati","year":"2019","journal-title":"Infrared Phys. Technol."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Zhang, L., and Peng, Z. (2019). Infrared small target detection based on partial sum of the tensor nuclear norm. Remote Sens., 11.","DOI":"10.3390\/rs11040382"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1109\/MAES.2012.6196254","article-title":"Small infrared target detection using sparse ring representation","volume":"27","author":"Gao","year":"2012","journal-title":"IEEE Aerosp. Electron. Syst. Mag."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"421","DOI":"10.1016\/j.infrared.2016.06.021","article-title":"Infrared small target and background separation via column-wise weighted robust principal component analysis","volume":"77","author":"Dai","year":"2016","journal-title":"Infrared Phys. Technol."},{"key":"ref_12","unstructured":"Wang, H., Zhou, L., and Wang, L. (November, January 27). Miss detection vs. false alarm: Adversarial learning for small object segmentation in infrared images. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Dai, Y., Wu, Y., Zhou, F., and Barnard, K. (2021, January 3\u20138). Asymmetric contextual modulation for infrared small target detection. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA.","DOI":"10.1109\/WACV48630.2021.00099"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Wang, X., Girshick, R., Gupta, A., and He, K. (2018, January 18\u201323). Non-local neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00813"},{"key":"ref_15","unstructured":"Zhu, Z., Xu, M., Bai, S., Huang, T., and Bai, X. (November, January 27). Asymmetric non-local neural networks for semantic segmentation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Cao, Y., Xu, J., Lin, S., Wei, F., and Hu, H. (2019, January 27\u201328). Gcnet: Non-local networks meet squeeze-excitation networks and beyond. Proceedings of the IEEE\/CVF International Conference on Computer Vision Workshops, Seoul, Korea.","DOI":"10.1109\/ICCVW.2019.00246"},{"key":"ref_17","first-page":"6000","article-title":"Attention is all you need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_18","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"103659","DOI":"10.1016\/j.infrared.2021.103659","article-title":"ISTDet: An efficient end-to-end neural network for infrared small target detection","volume":"114","author":"Ju","year":"2021","journal-title":"Infrared Phys. Technol."},{"key":"ref_20","first-page":"3000412","article-title":"A Spatial-Temporal Feature-Based Detection Framework for Infrared Dim Small Target","volume":"60","author":"Du","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"102949","DOI":"10.1016\/j.dsp.2020.102949","article-title":"Detection and tracking of infrared small target by jointly using SSD and pipeline filter","volume":"110","author":"Ding","year":"2021","journal-title":"Digit. Signal Process."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Chen, G., and Wang, W. (2020). Target recognition in infrared circumferential scanning system via deep convolutional neural networks. Sensors, 20.","DOI":"10.3390\/s20071922"},{"key":"ref_23","first-page":"1","article-title":"Infrared small UAV target detection based on residual image prediction via global and local dilated residual networks","volume":"19","author":"Fang","year":"2021","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_24","unstructured":"Zhao, M., Cheng, L., Yang, X., Feng, P., Liu, L., and Wu, N. (2019). TBC-Net: A real-time detector for infrared small target detection using semantic constraint. arXiv."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"4481","DOI":"10.1109\/TGRS.2020.3012981","article-title":"A novel pattern for infrared small target detection with generative adversarial network","volume":"59","author":"Zhao","year":"2020","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"9813","DOI":"10.1109\/TGRS.2020.3044958","article-title":"Attentional local contrast networks for infrared small target detection","volume":"59","author":"Dai","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/LGRS.2022.3141584","article-title":"RISTDnet: Robust infrared small target detection network","volume":"19","author":"Hou","year":"2021","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_28","unstructured":"Zhang, T., Cao, S., Pu, T., and Peng, Z. (2021). AGPCNet: Attention-Guided Pyramid Context Networks for Infrared Small Target Detection. arXiv."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"103755","DOI":"10.1016\/j.infrared.2021.103755","article-title":"Infrared small target segmentation with multiscale feature representation","volume":"116","author":"Huang","year":"2021","journal-title":"Infrared Phys. Technol."},{"key":"ref_30","first-page":"91","article-title":"Faster r-cnn: Towards real-time object detection with region proposal networks","volume":"28","author":"Ren","year":"2015","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016, January 11\u201314). Ssd: Single shot multibox detector. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"216","DOI":"10.1016\/j.patcog.2016.04.002","article-title":"Multiscale patch-based contrast measure for small infrared target detection","volume":"58","author":"Wei","year":"2016","journal-title":"Pattern Recognit."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201323). Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Li, X., Wang, W., Hu, X., and Yang, J. (2019, January 15\u201320). Selective kernel networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00060"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.Y., and Kweon, I.S. (2018, January 8\u201314). Cbam: Convolutional block attention module. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Dai, Y., Oehmcke, S., Gieseke, F., Wu, Y., and Barnard, K. (2021, January 10\u201315). Attention as activation. Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy.","DOI":"10.1109\/ICPR48806.2021.9413020"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Dai, Y., Gieseke, F., Oehmcke, S., Wu, Y., and Barnard, K. (2021, January 3\u20138). Attentional feature fusion. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA.","DOI":"10.1109\/WACV48630.2021.00360"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. (2020, January 23\u201328). End-to-end object detection with transformers. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"ref_39","first-page":"12077","article-title":"SegFormer: Simple and efficient design for semantic segmentation with transformers","volume":"34","author":"Xie","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Yang, F., Yang, H., Fu, J., Lu, H., and Guo, B. (2020, January 13\u201319). Learning texture transformer network for image super-resolution. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00583"},{"key":"ref_41","unstructured":"Zhou, H.Y., Guo, J., Zhang, Y., Yu, L., Wang, L., and Yu, Y. (2021). nnFormer: Interleaved Transformer for Volumetric Segmentation. arXiv."},{"key":"ref_42","first-page":"1","article-title":"Multistage attention ResU-Net for semantic segmentation of fine-resolution remote sensing images","volume":"19","author":"Li","year":"2021","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_43","unstructured":"Liu, F., Gao, C., Chen, F., Meng, D., Zuo, W., and Gao, X. (2021). Infrared Small-Dim Target Detection with Transformer under Complex Backgrounds. arXiv."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Rahman, M.A., and Wang, Y. (2016, January 12\u201314). Optimizing intersection-over-union in deep neural networks for image segmentation. Proceedings of the International Symposium on Visual Computing, Las Vegas, NV, USA.","DOI":"10.1007\/978-3-319-50835-1_22"},{"key":"ref_46","unstructured":"Li, B., Xiao, C., Wang, L., Wang, Y., Lin, Z., Li, M., An, W., and Guo, Y. (2021). Dense nested attention network for infrared small target detection. arXiv."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., and Lin, D. (2019, January 15\u201320). Libra r-cnn: Towards balanced learning for object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00091"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/14\/3258\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:43:30Z","timestamp":1760139810000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/14\/3258"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7,6]]},"references-count":47,"journal-issue":{"issue":"14","published-online":{"date-parts":[[2022,7]]}},"alternative-id":["rs14143258"],"URL":"https:\/\/doi.org\/10.3390\/rs14143258","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,7,6]]}}}