{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,13]],"date-time":"2026-03-13T15:20:46Z","timestamp":1773415246342,"version":"3.50.1"},"reference-count":54,"publisher":"MDPI AG","issue":"16","license":[{"start":{"date-parts":[[2023,8,16]],"date-time":"2023-08-16T00:00:00Z","timestamp":1692144000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Natural Science Foundation of China","award":["62002082"],"award-info":[{"award-number":["62002082"]}]},{"name":"National Natural Science Foundation of China","award":["2020GXNSFBA238014"],"award-info":[{"award-number":["2020GXNSFBA238014"]}]},{"name":"National Natural Science Foundation of China","award":["202210595023"],"award-info":[{"award-number":["202210595023"]}]},{"name":"Guangxi Natural Science Foundation","award":["62002082"],"award-info":[{"award-number":["62002082"]}]},{"name":"Guangxi Natural Science Foundation","award":["2020GXNSFBA238014"],"award-info":[{"award-number":["2020GXNSFBA238014"]}]},{"name":"Guangxi Natural Science Foundation","award":["202210595023"],"award-info":[{"award-number":["202210595023"]}]},{"name":"university student innovation training program project","award":["62002082"],"award-info":[{"award-number":["62002082"]}]},{"name":"university student innovation training program project","award":["2020GXNSFBA238014"],"award-info":[{"award-number":["2020GXNSFBA238014"]}]},{"name":"university student innovation training program project","award":["202210595023"],"award-info":[{"award-number":["202210595023"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Infrared and visible image fusion aims to generate a single fused image that not only contains rich texture details and salient objects, but also facilitates downstream tasks. However, existing works mainly focus on learning different modality-specific or shared features, and ignore the importance of modeling cross-modality features. To address these challenges, we propose Dual-branch Progressive learning for infrared and visible image fusion with a complementary self-Attention and Convolution (DPACFuse) network. On the one hand, we propose Cross-Modality Feature Extraction (CMEF) to enhance information interaction and the extraction of common features across modalities. In addition, we introduce a high-frequency gradient convolution operation to extract fine-grained information and suppress high-frequency information loss. On the other hand, to alleviate the CNN issues of insufficient global information extraction and computation overheads of self-attention, we introduce the ACmix, which can fully extract local and global information in the source image with a smaller computational overhead than pure convolution or pure self-attention. Extensive experiments demonstrated that the fused images generated by DPACFuse not only contain rich texture information, but can also effectively highlight salient objects. Additionally, our method achieved approximately 3% improvement over the state-of-the-art methods in MI, Qabf, SF, and AG evaluation indicators. More importantly, our fused images enhanced object detection and semantic segmentation by approximately 10%, compared to using infrared and visible images separately.<\/jats:p>","DOI":"10.3390\/s23167205","type":"journal-article","created":{"date-parts":[[2023,8,16]],"date-time":"2023-08-16T10:09:33Z","timestamp":1692180573000},"page":"7205","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["DPACFuse: Dual-Branch Progressive Learning for Infrared and Visible Image Fusion with Complementary Self-Attention and Convolution"],"prefix":"10.3390","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-2149-767X","authenticated-orcid":false,"given":"Huayi","family":"Zhu","sequence":"first","affiliation":[{"name":"School of Computer and Information Security, Guilin University of Electronic Science and Technology, Guilin 541004, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-7201-8283","authenticated-orcid":false,"given":"Heshan","family":"Wu","sequence":"additional","affiliation":[{"name":"School of Computer and Information Security, Guilin University of Electronic Science and Technology, Guilin 541004, China"}]},{"given":"Xiaolong","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Computer and Information Security, Guilin University of Electronic Science and Technology, Guilin 541004, China"}]},{"given":"Dongmei","family":"He","sequence":"additional","affiliation":[{"name":"School of Computer and Information Security, Guilin University of Electronic Science and Technology, Guilin 541004, China"}]},{"given":"Zhenbing","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence, Guilin University of Electronic Science and Technology, Guilin 541004, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2581-0520","authenticated-orcid":false,"given":"Xipeng","family":"Pan","sequence":"additional","affiliation":[{"name":"School of Computer and Information Security, Guilin University of Electronic Science and Technology, Guilin 541004, China"}]}],"member":"1968","published-online":{"date-parts":[[2023,8,16]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"3","DOI":"10.11834\/jig.220422","article-title":"Deep learning-based image fusion: A survey","volume":"28","author":"Tang","year":"2023","journal-title":"J. Image Graph."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Wang, J., Liu, A., Yin, Z., Liu, S., Tang, S., and Liu, X. (2021). Dual Attention Suppression Attack: Generate Adversarial Camouflage in Physical World. arXiv.","DOI":"10.1109\/CVPR46437.2021.00846"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"5769","DOI":"10.1109\/TIP.2021.3082317","article-title":"Training Robust Deep Neural Networks via Adversarial Noise Propagation","volume":"30","author":"Liu","year":"2021","journal-title":"IEEE Trans. Image Process."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Zeng, Y., Zhang, D., Wang, C., Miao, Z., Liu, T., Zhan, X., Hao, D., and Ma, C. (2022, January 18\u201324). LIFT: Learning 4D LiDAR Image Fusion Transformer for 3D Object Detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01666"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"102867","DOI":"10.1016\/j.media.2023.102867","article-title":"SMILE: Cost-sensitive multi-task learning for nuclear segmentation and classification with imbalanced annotations","volume":"88","author":"Pan","year":"2023","journal-title":"Med. Image Anal."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Jin, C., Luo, C., Yan, M., Zhao, G., Zhang, G., and Zhang, S. (2023). Weakening the Dominant Role of Text: CMOSI Dataset and Multimodal Semantic Enhancement Network. IEEE Trans. Neural Netw. Learn. Syst., 1\u201315.","DOI":"10.1109\/TNNLS.2023.3282953"},{"key":"ref_7","unstructured":"Qin, H., Ding, Y., Zhang, M., Yan, Q., Liu, A., Dang, Q., Liu, Z., and Liu, X. (2022). BiBERT: Accurate Fully Binarized BERT. arXiv."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Qin, H., Zhang, X., Gong, R., Ding, Y., Xu, Y., and Liu, X. (2022). Distribution-sensitive Information Retention for Accurate Binary Neural Network. arXiv.","DOI":"10.1007\/s11263-022-01687-5"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"319","DOI":"10.1049\/cit2.12153","article-title":"A semantic and emotion-based dual latent variable generation model for a dialogue system","volume":"8","author":"Yan","year":"2023","journal-title":"Caai Trans. Intell. Technol."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"39655","DOI":"10.1007\/s11042-022-13058-w","article-title":"Pedestrian detection in infrared image based on depth transfer learning","volume":"81","author":"Wang","year":"2022","journal-title":"Multimed. Tools Appl."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"107781","DOI":"10.1016\/j.compeleceng.2022.107781","article-title":"An infrared pedestrian detection method based on segmentation and domain adaptation learning","volume":"99","author":"Zhang","year":"2022","journal-title":"Comput. Electr. Eng."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Liu, J., Fan, X., Huang, Z., Wu, G., Liu, R., Zhong, W., and Luo, Z. (2022). Target-aware Dual Adversarial Learning and a Multi-scenario Multi-Modality Benchmark to Fuse Infrared and Visible for Object. arXiv.","DOI":"10.1109\/CVPR52688.2022.00571"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1200","DOI":"10.1109\/JAS.2022.105686","article-title":"SwinFusion: Cross-domain Long-range Learning for General Image Fusion via Swin Transformer","volume":"9","author":"Ma","year":"2022","journal-title":"IEEE\/CAA J. Autom. Sin."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"2121","DOI":"10.1109\/JAS.2022.106082","article-title":"SuperFusion: A Versatile Image Registration and Fusion Network with Semantic Awareness","volume":"9","author":"Tang","year":"2022","journal-title":"IEEE\/CAA J. Autom. Sin."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Wang, Z., Chen, Y., Shao, W., Li, H., and Zhang, L. (2022). SwinFuse: A Residual Swin Transformer Fusion Network for Infrared and Visible Images. arXiv.","DOI":"10.1109\/TIM.2022.3191664"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"2761","DOI":"10.1007\/s11263-021-01501-8","article-title":"SDNet: A Versatile Squeeze-and-Decomposition Network for Real-Time Image Fusion","volume":"129","author":"Zhang","year":"2021","journal-title":"Int. J. Comput. Vis."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"336","DOI":"10.1016\/j.inffus.2022.12.007","article-title":"AT-GAN: A generative adversarial network with attention and transition for infrared and visible image fusion","volume":"92","author":"Rao","year":"2023","journal-title":"Inf. Fusion"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"2614","DOI":"10.1109\/TIP.2018.2887342","article-title":"DenseFuse: A Fusion Approach to Infrared and Visible Images","volume":"28","author":"Li","year":"2019","journal-title":"IEEE Trans. Image Process."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"72","DOI":"10.1016\/j.inffus.2021.02.023","article-title":"RFN-Nest: An end-to-end residual fusion network for infrared and visible images","volume":"73","author":"Li","year":"2021","journal-title":"Inf. Fusion"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"9645","DOI":"10.1109\/TIM.2020.3005230","article-title":"NestFuse: An Infrared and Visible Image Fusion Architecture Based on Nest Connection and Spatial\/Channel Attention Models","volume":"69","author":"Li","year":"2020","journal-title":"IEEE Trans. Instrum. Meas."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1016\/j.inffus.2021.12.004","article-title":"Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network","volume":"82","author":"Tang","year":"2022","journal-title":"Inf. Fusion"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1016\/j.inffus.2019.07.011","article-title":"IFCNN: A general image fusion framework based on convolutional neural network","volume":"54","author":"Zhang","year":"2020","journal-title":"Inf. Fusion"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"5009513","DOI":"10.1109\/TIM.2021.3075747","article-title":"STDFusionNet: An Infrared and Visible Image Fusion Network Based on Salient Target Detection","volume":"70","author":"Ma","year":"2021","journal-title":"IEEE Trans. Instrum. Meas."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1186","DOI":"10.1109\/TCSVT.2021.3075745","article-title":"Efficient and Model-Based Infrared and Visible Image Fusion via Algorithm Unrolling","volume":"32","author":"Zhao","year":"2022","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Liang, J., Cao, J., Sun, G., Zhang, K., Gool, L.V., and Timofte, R. (2021). SwinIR: Image Restoration Using Swin Transformer. arXiv.","DOI":"10.1109\/ICCVW54120.2021.00210"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Qu, L., Liu, S., Wang, M., and Song, Z. (2021). TransMEF: A Transformer-Based Multi-Exposure Image Fusion Framework using Self-Supervised Multi-Task Learning. arXiv.","DOI":"10.2139\/ssrn.4130858"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Pan, X., Ge, C., Lu, R., Song, S., Chen, G., Huang, Z., and Huang, G. (2022). On the Integration of Self-Attention and Convolution. arXiv.","DOI":"10.1109\/CVPR52688.2022.00089"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1016\/j.inffus.2022.03.007","article-title":"PIAFusion: A progressive infrared and visible image fusion network based on illumination aware","volume":"83\u201384","author":"Tang","year":"2022","journal-title":"Inf. Fusion"},{"key":"ref_29","first-page":"5005012","article-title":"Res2Fusion: Infrared and Visible Image Fusion Based on Dense Res2net and Double Nonlocal Attention Models","volume":"71","author":"Wang","year":"2022","journal-title":"IEEE Trans. Instrum. Meas."},{"key":"ref_30","first-page":"12797","article-title":"Rethinking the Image Fusion: A Fast Unified Image Fusion Network based on Proportional Maintenance of Gradient and Intensity","volume":"34","author":"Zhang","year":"2020","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"502","DOI":"10.1109\/TPAMI.2020.3012548","article-title":"U2Fusion: A Unified Unsupervised Image Fusion Network","volume":"44","author":"Xu","year":"2022","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1016\/j.inffus.2018.09.004","article-title":"FusionGAN: A generative adversarial network for infrared and visible image fusion","volume":"48","author":"Ma","year":"2019","journal-title":"Inf. Fusion"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"4980","DOI":"10.1109\/TIP.2020.2977573","article-title":"DDcGAN: A Dual-Discriminator Conditional Generative Adversarial Network for Multi-Resolution Image Fusion","volume":"29","author":"Ma","year":"2020","journal-title":"IEEE Trans. Image Process."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Park, S., Choi, D.H., Kim, J.U., and Ro, Y.M. (2022, January 22\u201327). Robust thermal infrared pedestrian detection by associating visible pedestrian knowledge. Proceedings of the ICASSP 2022\u20142022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore.","DOI":"10.1109\/ICASSP43922.2022.9746886"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"364","DOI":"10.1109\/TIP.2022.3228497","article-title":"UIU-Net: U-Net in U-Net for Infrared Small Object Detection","volume":"32","author":"Wu","year":"2023","journal-title":"IEEE Trans. Image Process."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Wang, A., Li, W., Wu, X., Huang, Z., and Tao, R. (2022, January 17\u201322). Mpanet: Multi-Patch Attention for Infrared Small Target Object Detection. Proceedings of the IGARSS 2022\u20142022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia.","DOI":"10.1109\/IGARSS46834.2022.9884041"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Sun, Y., Cao, B., Zhu, P., and Hu, Q. (2022, January 4\u20137). DetFusion: A Detection-Driven Infrared and Visible Image Fusion Network. Proceedings of the MM\u201922: 30th ACM International Conference on Multimedia, New York, NY, USA.","DOI":"10.1145\/3503161.3547902"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Zhao, W., Xie, S., Zhao, F., He, Y., and Lu, H. (2023, January 18\u201322). MetaFusion: Infrared and Visible Image Fusion via Meta-Feature Embedding From Object Detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.01341"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"101828","DOI":"10.1016\/j.inffus.2023.101828","article-title":"An interactively reinforced paradigm for joint infrared-visible image fusion and saliency object detection","volume":"98","author":"Wang","year":"2023","journal-title":"Inf. Fusion"},{"key":"ref_40","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017). Attention Is All You Need. arXiv."},{"key":"ref_41","unstructured":"Zhu, X., Su, W., Lu, L., Li, B., Wang, X., and Dai, J. (2021). Deformable DETR: Deformable Transformers for End-to-End Object Detection. arXiv."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Zhou, M., Yan, K., Huang, J., Yang, Z., Fu, X., and Zhao, F. (2022, January 18\u201324). Mutual Information-Driven Pan-Sharpening. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00184"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Wang, W., Xie, E., Li, X., Fan, D.P., Song, K., Liang, D., Lu, T., Luo, P., and Shao, L. (2021, January 11\u201317). Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction Without Convolutions. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00061"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Zhao, H., and Nie, R. (2021, January 24\u201326). DNDT: Infrared and Visible Image Fusion Via DenseNet and Dual-Transformer. Proceedings of the 2021 International Conference on Information Technology and Biomedical Engineering (ICITBE), Nanchang, China.","DOI":"10.1109\/ICITBE54178.2021.00025"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021). Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. arXiv.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Rao, D., Wu, X., and Xu, T. (2022). TGFuse: An Infrared and Visible Image Fusion Approach Based on Transformer and Generative Adversarial Network. arXiv.","DOI":"10.1109\/TIP.2023.3273451"},{"key":"ref_47","first-page":"5012314","article-title":"CGTF: Convolution-Guided Transformer for Infrared and Visible Image Fusion","volume":"71","author":"Li","year":"2022","journal-title":"IEEE Trans. Instrum. Meas."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201323). Squeeze-and-Excitation Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_49","unstructured":"Toet, A. (2023, January 01). TNO Image Fusion Dataset. Available online: https:\/\/figshare.com\/articles\/dataset\/TNO_Image_Fusion_Dataset\/1008029."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1016\/j.inffus.2018.02.004","article-title":"Infrared and visible image fusion methods and applications: A survey","volume":"45","author":"Ma","year":"2019","journal-title":"Inf. Fusion"},{"key":"ref_51","first-page":"5005014","article-title":"GANMcC: A Generative Adversarial Network with Multiclassification Constraints for Infrared and Visible Image Fusion","volume":"70","author":"Ma","year":"2021","journal-title":"IEEE Trans. Instrum. Meas."},{"key":"ref_52","unstructured":"Reis, D., Kupec, J., Hong, J., and Daoudi, A. (2023). Real-Time Flying Object Detection with YOLOv8. arXiv."},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Padilla, R., Passos, W.L., Dias, T.L.B., Netto, S.L., and da Silva, E.A.B. (2021). A Comparative Analysis of Object Detection Metrics with a Companion Open-Source Toolkit. Electronics, 10.","DOI":"10.3390\/electronics10030279"},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"188","DOI":"10.1016\/j.neunet.2021.01.021","article-title":"Bilateral attention decoder: A lightweight decoder for real-time semantic segmentation","volume":"137","author":"Peng","year":"2021","journal-title":"Neural Netw."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/16\/7205\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T20:34:55Z","timestamp":1760128495000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/16\/7205"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,16]]},"references-count":54,"journal-issue":{"issue":"16","published-online":{"date-parts":[[2023,8]]}},"alternative-id":["s23167205"],"URL":"https:\/\/doi.org\/10.3390\/s23167205","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,8,16]]}}}