{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,1]],"date-time":"2026-02-01T14:50:21Z","timestamp":1769957421265,"version":"3.49.0"},"reference-count":51,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2023,2,28]],"date-time":"2023-02-28T00:00:00Z","timestamp":1677542400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>The current deep learning-based image fusion methods can not sufficiently learn the features of images in a wide frequency range. Therefore, we proposed IFormerFusion, which is based on the Inception Transformer and cross-domain frequency fusion. To learn features from high- and low-frequency information, we designed the IFormer mixer, which splits the input features through the channel dimension and feeds them into parallel paths for high- and low-frequency mixers to achieve linear computational complexity. The high-frequency mixer adopts a convolution and a max-pooling path, while the low-frequency mixer adopts a criss-cross attention path. Considering that the high-frequency information relates to the texture detail, we designed a cross-domain frequency fusion strategy, which trades high-frequency information between the source images. This structure can sufficiently integrate complementary features and strengthen the capability of texture retaining. Experiments on the TNO, OSU, and Road Scene datasets demonstrate that IFormerFusion outperforms other methods in object and subject evaluations.<\/jats:p>","DOI":"10.3390\/rs15051352","type":"journal-article","created":{"date-parts":[[2023,2,28]],"date-time":"2023-02-28T03:59:57Z","timestamp":1677556797000},"page":"1352","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["IFormerFusion: Cross-Domain Frequency Information Learning for Infrared and Visible Image Fusion Based on the Inception Transformer"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6725-3553","authenticated-orcid":false,"given":"Zhang","family":"Xiong","sequence":"first","affiliation":[{"name":"Department of Weapon Engineering, Naval University of Engineering, Wuhan 430030, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1428-5169","authenticated-orcid":false,"given":"Xiaohui","family":"Zhang","sequence":"additional","affiliation":[{"name":"Department of Weapon Engineering, Naval University of Engineering, Wuhan 430030, China"}]},{"given":"Qingping","family":"Hu","sequence":"additional","affiliation":[{"name":"Department of Weapon Engineering, Naval University of Engineering, Wuhan 430030, China"}]},{"given":"Hongwei","family":"Han","sequence":"additional","affiliation":[{"name":"Department of Weapon Engineering, Naval University of Engineering, Wuhan 430030, China"}]}],"member":"1968","published-online":{"date-parts":[[2023,2,28]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"579","DOI":"10.1109\/TIP.2019.2928126","article-title":"Learning Modality-Specific Representations for Visible-Infrared Person Re-Identification","volume":"29","author":"Feng","year":"2020","journal-title":"IEEE Trans. Image Process."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Zhang, X., Ye, P., Qiao, D., Zhao, J., Peng, S., and Xiao, G. (2019, January 2\u20135). Object Fusion Tracking Based on Visible and Infrared Images Using Fully Convolutional Siamese Networks. Proceedings of the 2019 22th International Conference on Information Fusion (FUSION), Ottawa, ON, Canada.","DOI":"10.23919\/FUSION43075.2019.9011253"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"110","DOI":"10.1016\/j.inffus.2020.04.006","article-title":"Pan-GAN: An Unsupervised Pan-Sharpening Method for Remote Sensing Image Fusion","volume":"62","author":"Ma","year":"2020","journal-title":"Inf. Fusion"},{"key":"ref_4","first-page":"198","article-title":"Feature Level Image Fusion of Optical Imagery and Synthetic Aperture Radar (SAR) for Invasive Alien Plant Species Detection and Mapping","volume":"10","author":"Rajah","year":"2018","journal-title":"Remote Sens. Appl. Soc. Environ."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"2192","DOI":"10.1109\/TMM.2021.3077767","article-title":"CCAFNet: Crossflow and Cross-Scale Adaptive Fusion Network for Detecting Salient Objects in RGB-D Images","volume":"24","author":"Zhou","year":"2021","journal-title":"IEEE Trans. Multimed."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"2864","DOI":"10.1109\/TIP.2013.2244222","article-title":"Image Fusion With Guided Filtering","volume":"22","author":"Li","year":"2013","journal-title":"IEEE Trans. Image Process."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"575","DOI":"10.1007\/s10044-020-00919-z","article-title":"Infrared and Visible Image Fusion Using Modified Spatial Frequency-Based Clustered Dictionary","volume":"24","author":"Budhiraja","year":"2020","journal-title":"Pattern Anal. Appl."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"182","DOI":"10.1016\/j.neucom.2016.11.051","article-title":"A Novel Infrared and Visible Image Fusion Algorithm Based on Shift-Invariant Dual-Tree Complex Shearlet Transform and Sparse Representation","volume":"226","author":"Yin","year":"2017","journal-title":"Neurocomputing"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"2274","DOI":"10.1016\/j.ijleo.2013.10.064","article-title":"A Novel Image Fusion Algorithm Based on Nonsubsampled Shearlet Transform","volume":"125","author":"Yin","year":"2014","journal-title":"Optik"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"223","DOI":"10.1016\/j.infrared.2018.08.004","article-title":"Infrared and Visible Image Fusion Using Co-Occurrence Filter","volume":"93","author":"Zhang","year":"2018","journal-title":"Infrared Phys. Technol."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Li, H., Wu, X.-J., and Kittler, J. (2018, January 20\u201324). Infrared and Visible Image Fusion Using a Deep Learning Framework. Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China.","DOI":"10.1109\/ICPR.2018.8546006"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"128","DOI":"10.1016\/j.inffus.2020.11.009","article-title":"RXDNFuse: A Aggregated Residual Dense Network for Infrared and Visible Image Fusion","volume":"69","author":"Long","year":"2021","journal-title":"Inf. Fusion"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1109\/TIP.2018.2887342","article-title":"DenseFuse: A Fusion Approach to Infrared and Visible Images","volume":"28","author":"Li","year":"2019","journal-title":"IEEE Trans. Image Process."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Liu, X., Gao, H., Miao, Q., Xi, Y., Ai, Y., and Gao, D. (2022). MFST: Multi-Modal Feature Self-Adaptive Transformer for Infrared and Visible Image Fusion. Remote Sens., 14.","DOI":"10.3390\/rs14133233"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1016\/j.inffus.2018.09.004","article-title":"FusionGAN: A Generative Adversarial Network for Infrared and Visible Image Fusion","volume":"48","author":"Ma","year":"2019","journal-title":"Inf. Fusion"},{"key":"ref_16","first-page":"5005014","article-title":"GANMcC: A Generative Adversarial Network With Multiclassification Constraints for Infrared and Visible Image Fusion","volume":"70","author":"Ma","year":"2021","journal-title":"IEEE Trans. Instrum. Meas."},{"key":"ref_17","first-page":"5012314","article-title":"CGTF: Convolution-Guided Transformer for Infrared and Visible Image Fusion","volume":"71","author":"Li","year":"2022","journal-title":"IEEE Trans. Instrum. Meas."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1200","DOI":"10.1109\/JAS.2022.105686","article-title":"SwinFusion: Cross-Domain Long-Range Learning for General Image Fusion via Swin Transformer","volume":"9","author":"Ma","year":"2022","journal-title":"IEEE\/CAA J. Autom. Sin."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Zhai, X., Kolesnikov, A., Houlsby, N., and Beyer, L. (2022, January 18\u201324). Scaling Vision Transformers. Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01179"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Ding, M., Xiao, B., Codella, N.C.F., Luo, P., Wang, J., and Yuan, L. (2022, January 23\u201327). DaViT: Dual Attention Vision Transformers. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-20053-3_5"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 11\u201317). Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_22","unstructured":"Chen, Z., Duan, Y., Wang, W., He, J., Lu, T., Dai, J., and Qiao, Y. (2022). Vision Transformer Adapter for Dense Predictions. arXiv."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., Ning, J., Cao, Y., Zhang, Z., and Dong, L. (2022, January 18\u201324). Swin Transformer V2: Scaling Up Capacity and Resolution 2022. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01170"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Sun, Z., Cao, S., Yang, Y., and Kitani, K. (2020, January 11\u201317). Rethinking Transformer-Based Set Prediction for Object Detection. Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00359"},{"key":"ref_25","first-page":"28","article-title":"Remote Sensing and Artificial Intelligence in the Mine Action Sector","volume":"25","author":"Jebens","year":"2021","journal-title":"J. Conv. Weapons Destr."},{"key":"ref_26","first-page":"14","article-title":"To What Extent Could the Development of an Airborne Thermal Imaging Detection System Contribute to Enhance Detection?","volume":"24","author":"Jebens","year":"2020","journal-title":"J. Conv. Weapons Destr."},{"key":"ref_27","first-page":"15","article-title":"Proof: How Small Drones Can Find Buried Landmines in the Desert Using Airborne IR Thermography","volume":"24","author":"Fardoulis","year":"2020","journal-title":"J. Conv. Weapons Destr."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Baur, J., Steinberg, G., Nikulin, A., Chiu, K., and de Smet, T.S. (2020). Applying Deep Learning to Automate UAV-Based Detection of Scatterable Landmines. Remote Sens., 12.","DOI":"10.3390\/rs12050859"},{"key":"ref_29","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2021, January 12\u201314). An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. Proceedings of the International Conference on Learning Representations (ICLR), Online."},{"key":"ref_30","unstructured":"Si, C., Yu, W., Zhou, P., Zhou, Y., Wang, X., and Yan, S. (2022). Inception Transformer. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"9645","DOI":"10.1109\/TIM.2020.3005230","article-title":"NestFuse: An Infrared and Visible Image Fusion Architecture Based on Nest Connection and Spatial\/Channel Attention Models","volume":"69","author":"Li","year":"2020","journal-title":"IEEE Trans. Instrum. Meas."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"824","DOI":"10.1109\/TCI.2021.3100986","article-title":"Classification Saliency-Based Rule for Visible and Infrared Image Fusion","volume":"7","author":"Xu","year":"2021","journal-title":"IEEE Trans. Comput. Imaging"},{"key":"ref_33","first-page":"5011410","article-title":"Res2Fusion: Infrared and Visible Image Fusion Based on Dense Res2net and Double Nonlocal Attention Models","volume":"71","author":"Wang","year":"2022","journal-title":"IEEE Trans. Instrum. Meas."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"4980","DOI":"10.1109\/TIP.2020.2977573","article-title":"DDcGAN: A Dual-Discriminator Conditional Generative Adversarial Network for Multi-Resolution Image Fusion","volume":"29","author":"Ma","year":"2020","journal-title":"IEEE Trans. Image Process."},{"key":"ref_35","unstructured":"Rao, D., Wu, X., and Xu, T. (2022). TGFuse: An Infrared and Visible Image Fusion Approach Based on Transformer and Generative Adversarial Network. arXiv."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015, January 7\u201312). Going Deeper with Convolutions. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Chollet, F. (2017, January 21\u201326). Xception: Deep Learning with Depthwise Separable Convolutions. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.195"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Shi, H., and Liu, W. (2019, January 2). CCNet: Criss-Cross Attention for Semantic Segmentation. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00069"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"600","DOI":"10.1109\/TIP.2003.819861","article-title":"Image Quality Assessment: From Error Visibility to Structural Similarity","volume":"13","author":"Wang","year":"2014","journal-title":"IEEE Trans. Image Process."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1016\/j.inffus.2019.07.011","article-title":"IFCNN: A General Image Fusion Framework Based on Convolutional Neural Network","volume":"54","author":"Zhang","year":"2020","journal-title":"Inf. Fusion"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"72","DOI":"10.1016\/j.inffus.2021.02.023","article-title":"RFN-Nest: An End-to-End Residual Fusion Network for Infrared and Visible Images","volume":"73","author":"Li","year":"2021","journal-title":"Inf. Fusion"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"5016412","DOI":"10.1109\/TIM.2022.3216413","article-title":"SwinFuse: A Residual Swin Transformer Fusion Network for Infrared and Visible Images","volume":"71","author":"Wang","year":"2022","journal-title":"IEEE Trans. Instrum. Meas."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"502","DOI":"10.1109\/TPAMI.2020.3012548","article-title":"U2Fusion: A Unified Unsupervised Image Fusion Network","volume":"44","author":"Xu","year":"2022","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1016\/j.inffus.2022.03.007","article-title":"PIAFusion: A Progressive Infrared and Visible Image Fusion Network Based on Illumination Aware","volume":"83\u201384","author":"Tang","year":"2022","journal-title":"Inf. Fusion"},{"key":"ref_45","unstructured":"Toet, A. (2023, February 21). TNO Image Fusion Dataset. Figshare. Dataset. Available online: https:\/\/figshare.com\/articles\/dataset\/TNO_Image_Fusion_Dataset\/1008029\/1."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"162","DOI":"10.1016\/j.cviu.2006.06.010","article-title":"Background-Subtraction Using Contour-Based Fusion of Thermal and Visible Imagery","volume":"106","author":"Davis","year":"2007","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"1226","DOI":"10.1109\/TPAMI.2005.159","article-title":"Feature Selection Based on Mutual Information Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy","volume":"27","author":"Peng","year":"2005","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Haghighat, M., and Razian, M.A. (2014, January 15\u201317). Fast-FMI: Non-Reference Image Fusion Metric. Proceedings of the 2014 IEEE 8th International Conference on Application of Information and Communication Technologies (AICT), Astana, Kazakhstan.","DOI":"10.1109\/ICAICT.2014.7036000"},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1016\/j.inffus.2011.08.002","article-title":"A New Image Fusion Performance Metric Based on Visual Information Fidelity","volume":"14","author":"Han","year":"2013","journal-title":"Inf. Fusion"},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"1890","DOI":"10.1016\/j.aeue.2015.09.004","article-title":"A New Image Quality Metric for Image Fusion: The Sum of the Correlations of Differences","volume":"69","author":"Aslantas","year":"2015","journal-title":"AEU-Int. J. Electron. Commun."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"211301","DOI":"10.1007\/s11432-019-2757-1","article-title":"Perceptual Image Quality Assessment: A Survey","volume":"63","author":"Zhai","year":"2020","journal-title":"Sci. China Inf. Sci."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/5\/1352\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:43:55Z","timestamp":1760121835000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/5\/1352"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,28]]},"references-count":51,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2023,3]]}},"alternative-id":["rs15051352"],"URL":"https:\/\/doi.org\/10.3390\/rs15051352","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,2,28]]}}}