{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,16]],"date-time":"2026-04-16T03:15:08Z","timestamp":1776309308387,"version":"3.50.1"},"reference-count":53,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2024,5,30]],"date-time":"2024-05-30T00:00:00Z","timestamp":1717027200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Key Research and Development Program of China","award":["2018YFE0206900"],"award-info":[{"award-number":["2018YFE0206900"]}]},{"name":"National Key Research and Development Program of China","award":["SMDTKL-2023-1-01"],"award-info":[{"award-number":["SMDTKL-2023-1-01"]}]},{"name":"Open Project of Key Laboratory for Quality Evaluation of Ultrasound Surgical Equipment of National Medical Products Administration","award":["2018YFE0206900"],"award-info":[{"award-number":["2018YFE0206900"]}]},{"name":"Open Project of Key Laboratory for Quality Evaluation of Ultrasound Surgical Equipment of National Medical Products Administration","award":["SMDTKL-2023-1-01"],"award-info":[{"award-number":["SMDTKL-2023-1-01"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Multi-modal medical image fusion (MMIF) is crucial for disease diagnosis and treatment because the images reconstructed from signals collected by different sensors can provide complementary information. In recent years, deep learning (DL) based methods have been widely used in MMIF. However, these methods often adopt a serial fusion strategy without feature decomposition, causing error accumulation and confusion of characteristics across different scales. To address these issues, we have proposed the Coupled Image Reconstruction and Fusion (CIRF) strategy. Our method parallels the image fusion and reconstruction branches which are linked by a common encoder. Firstly, CIRF uses the lightweight encoder to extract base and detail features, respectively, through the Vision Transformer (ViT) and the Convolutional Neural Network (CNN) branches, where the two branches interact to supplement information. Then, two types of features are fused separately via different blocks and finally decoded into fusion results. In the loss function, both the supervised loss from the reconstruction branch and the unsupervised loss from the fusion branch are included. As a whole, CIRF increases its expressivity by adding multi-task learning and feature decomposition. Additionally, we have also explored the impact of image masking on the network\u2019s feature extraction ability and validated the generalization capability of the model. Through experiments on three datasets, it has been demonstrated both subjectively and objectively, that the images fused by CIRF exhibit appropriate brightness and smooth edge transition with more competitive evaluation metrics than those fused by several other traditional and DL-based methods.<\/jats:p>","DOI":"10.3390\/s24113545","type":"journal-article","created":{"date-parts":[[2024,5,31]],"date-time":"2024-05-31T03:46:49Z","timestamp":1717127209000},"page":"3545","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["CIRF: Coupled Image Reconstruction and Fusion Strategy for Deep Learning Based Multi-Modal Image Fusion"],"prefix":"10.3390","volume":"24","author":[{"given":"Junze","family":"Zheng","sequence":"first","affiliation":[{"name":"Department of Biomedical Engineering, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Junyan","family":"Xiao","sequence":"additional","affiliation":[{"name":"Department of Biomedical Engineering, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yaowei","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Biomedical Engineering, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4332-071X","authenticated-orcid":false,"given":"Xuming","family":"Zhang","sequence":"additional","affiliation":[{"name":"Department of Biomedical Engineering, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2024,5,30]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"170","DOI":"10.1118\/1.1636163","article-title":"MRI: Basic principles and applications","volume":"31","author":"Brown","year":"1995","journal-title":"Med. Phys."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"115","DOI":"10.2967\/jnmt.107.042978","article-title":"Principles of CT and CT technology","volume":"35","author":"Goldman","year":"2007","journal-title":"J. Nucl. Med. Technol."},{"key":"ref_3","first-page":"4S","article-title":"PET\/CT today and tomorrow","volume":"45","author":"Townsend","year":"2004","journal-title":"J. Nucl. Med."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1959","DOI":"10.1007\/s00259-010-1390-8","article-title":"A review on the clinical uses of SPECT\/CT","volume":"37","author":"Mariani","year":"2010","journal-title":"Eur. J. Nucl. Med. Mol. Imaging"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1016\/j.neucom.2015.07.160","article-title":"An overview of multi-modal medical image fusion","volume":"215","author":"Du","year":"2016","journal-title":"Neurocomputing"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Wang, Z., Cuia, Z., and Zhu, Y. (2020). Multi-modal medical image fusion by Laplacian pyramid and adaptive sparse representation. Comput. Biol. Med., 123.","DOI":"10.1016\/j.compbiomed.2020.103823"},{"key":"ref_7","first-page":"21","article-title":"Medical image fusion method by deep learning","volume":"2","author":"Li","year":"2021","journal-title":"Int. J. Cogn. Comput. Eng."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Zhao, Z., Bai, H., Zhang, J., Zhang, Y., Xu, S., Lin, Z., Timofte, R., and Van Gool, L. (2023, January 17\u201324). Cddfuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.00572"},{"key":"ref_9","unstructured":"(2024, February 25). RIRE. Available online: https:\/\/rire.insight-journal.org\/."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"323","DOI":"10.1016\/j.inffus.2021.06.008","article-title":"Image fusion meets deep learning: A survey and perspective","volume":"76","author":"Zhang","year":"2021","journal-title":"Inf. Fusion"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Du, J., Fang, M., Yu, Y., and Lu, G. (2020). An adaptive two-scale biomedical image fusion method with statistical comparisons. Comput. Methods Programs Biomed., 196.","DOI":"10.1016\/j.cmpb.2020.105603"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"275","DOI":"10.1016\/j.patrec.2008.10.003","article-title":"Optimal multi-level thresholding using a two-stage Otsu optimization approach","volume":"30","author":"Huang","year":"2009","journal-title":"Pattern Recognit. Lett."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1573","DOI":"10.1016\/j.eswa.2014.09.049","article-title":"Modified artificial bee colony based computationally efficient multilevel thresholding for satellite image segmentation using Kapur\u2019s, Otsu and Tsallis functions","volume":"42","author":"Bhandari","year":"2015","journal-title":"Expert Syst. Appl."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"2864","DOI":"10.1109\/TIP.2013.2244222","article-title":"Image fusion with guided filtering","volume":"22","author":"Li","year":"2013","journal-title":"IEEE Trans. Image Process."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"S447","DOI":"10.1016\/S0167-8140(04)82934-7","article-title":"Comparison of CT and CT-PET-fusion based 3D treatment plans in the percutaneous radiotherapy of lung cancer","volume":"73","author":"Kremp","year":"2004","journal-title":"Proc. Radiother. Oncol."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1109\/TIM.2018.2838778","article-title":"Medical image fusion with parameter-adaptive pulse coupled neural network in nonsubsampled shearlet transform domain","volume":"68","author":"Yin","year":"2018","journal-title":"IEEE Trans. Instrum. Meas."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"633","DOI":"10.1049\/iet-ipr.2012.0558","article-title":"Multi-focus image fusion based on non-subsampled shearlet transform","volume":"7","author":"Gao","year":"2013","journal-title":"IET Image Process."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"880","DOI":"10.1109\/TNN.2011.2128880","article-title":"A new automatic parameter setting method of a simplified PCNN for image segmentation","volume":"22","author":"Chen","year":"2011","journal-title":"IEEE Trans. Neural Netw."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"6880","DOI":"10.1109\/TIM.2020.2975405","article-title":"Laplacian redecomposition for multimodal medical image fusion","volume":"69","author":"Li","year":"2020","journal-title":"IEEE Trans. Instrum. Meas."},{"key":"ref_20","first-page":"671","article-title":"The laplacian pyramid as a compact image code","volume":"31","author":"Burt","year":"1987","journal-title":"Readings Comput. Vis."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1504\/IJBET.2020.110999","article-title":"A review on multimodal medical image fusion","volume":"34","author":"Reddy","year":"2020","journal-title":"Int. J. Biomed. Eng. Technol."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1016\/j.inffus.2019.07.011","article-title":"IFCNN: A general image fusion framework based on convolutional neural network","volume":"54","author":"Zhang","year":"2020","journal-title":"Inf. Fusion"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"502","DOI":"10.1109\/TPAMI.2020.3012548","article-title":"U2Fusion: A unified unsupervised image fusion network","volume":"44","author":"Xu","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"2614","DOI":"10.1109\/TIP.2018.2887342","article-title":"DenseFuse: A fusion approach to infrared and visible images","volume":"28","author":"Li","year":"2018","journal-title":"IEEE Trans. Image Process."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"9645","DOI":"10.1109\/TIM.2020.3005230","article-title":"NestFuse: An infrared and visible image fusion architecture based on nest connection and spatial\/channel attention models","volume":"69","author":"Li","year":"2020","journal-title":"IEEE Trans. Instrum. Meas."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"72","DOI":"10.1016\/j.inffus.2021.02.023","article-title":"RFN-Nest: An end-to-end residual fusion network for infrared and visible images","volume":"73","author":"Li","year":"2021","journal-title":"Inf. Fusion"},{"key":"ref_27","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017, January 4\u20139). Attention is all you need. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_28","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16 \u00d7 16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"5134","DOI":"10.1109\/TIP.2022.3193288","article-title":"MATR: Multimodal medical image fusion via multiscale adaptive transformer","volume":"31","author":"Tang","year":"2022","journal-title":"IEEE Trans. Image Process."},{"key":"ref_30","first-page":"5019711","article-title":"Transformer-based end-to-end anatomical and functional image fusion","volume":"71","author":"Zhang","year":"2022","journal-title":"IEEE Trans. Instrum. Meas."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"21741","DOI":"10.1007\/s00521-022-07635-1","article-title":"Multi-modal medical image fusion based on densely-connected high-resolution CNN and hybrid transformer","volume":"34","author":"Zhou","year":"2022","journal-title":"Neural Comput. Appl."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"3489","DOI":"10.1109\/JBHI.2023.3264819","article-title":"An Improved Hybrid Network With a Transformer Module for Medical Image Fusion","volume":"27","author":"Liu","year":"2023","journal-title":"IEEE J. Biomed. Health Inform."},{"key":"ref_33","unstructured":"Wu, Z., Liu, Z., Lin, J., Lin, Y., and Han, S. (2020). Lite transformer with long-short range attention. arXiv."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S., and Yang, M.H. (2022, January 18\u201324). Restormer: Efficient transformer for high-resolution image restoration. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00564"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Peng, Z., Huang, W., Gu, S., Xie, L., Wang, Y., Jiao, J., and Ye, Q. (2021, January 11\u201317). Conformer: Local features coupling global representations for visual recognition. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00042"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Fang, J., Lin, H., Chen, X., and Zeng, K. (2022, January 18\u201324). A hybrid network of cnn and transformer for lightweight image super-resolution. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPRW56347.2022.00119"},{"key":"ref_37","unstructured":"Clevert, D.A., Unterthiner, T., and Hochreiter, S. (2015). Fast and accurate deep network learning by exponential linear units (elus). arXiv."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L.C. (2018, January 18\u201323). Mobilenetv2: Inverted residuals and linear bottlenecks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00474"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"3348","DOI":"10.1109\/TMI.2023.3283517","article-title":"F-DARTS: Foveated differentiable architecture search based multimodal medical image fusion","volume":"42","author":"Ye","year":"2023","journal-title":"IEEE Trans. Med. Imaging"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"He, K., Chen, X., Xie, S., Li, Y., Doll\u00e1r, P., and Girshick, R. (2022, January 18\u201324). Masked autoencoders are scalable vision learners. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01553"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"600","DOI":"10.1109\/TIP.2003.819861","article-title":"Image quality assessment: From error visibility to structural similarity","volume":"13","author":"Wang","year":"2004","journal-title":"IEEE Trans. Image Process."},{"key":"ref_42","first-page":"271","article-title":"A 3 \u00d7 3 isotropic gradient operator for image processing","volume":"1968","author":"Sobel","year":"1968","journal-title":"Talk Stanf. Artif. Proj."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1016\/j.inffus.2021.12.004","article-title":"Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network","volume":"82","author":"Tang","year":"2022","journal-title":"Inf. Fusion"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"1890","DOI":"10.1016\/j.aeue.2015.09.004","article-title":"A new image quality metric for image fusion: The sum of the correlations of differences","volume":"69","author":"Aslantas","year":"2015","journal-title":"Aeu-Int. J. Electron. Commun."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"308","DOI":"10.1049\/el:20000267","article-title":"Objective image fusion performance measure","volume":"36","author":"Xydeas","year":"2000","journal-title":"Electron. Lett."},{"key":"ref_46","unstructured":"(2024, February 25). Atlas. Available online: https:\/\/www.med.harvard.edu\/aanlib\/."},{"key":"ref_47","unstructured":"(2024, February 25). IXI. Available online: https:\/\/brain-development.org\/."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"196","DOI":"10.1109\/TMI.2009.2035616","article-title":"Elastix: A toolbox for intensity-based medical image registration","volume":"29","author":"Klein","year":"2009","journal-title":"IEEE Trans. Med. Imaging"},{"key":"ref_49","first-page":"50","article-title":"Fast parallel image registration on CPU and GPU for diagnostic classification of Alzheimer\u2019s disease","volume":"7","author":"Shamonin","year":"2014","journal-title":"Front. Neuroinform."},{"key":"ref_50","first-page":"4819","article-title":"Deep learning-based multi-focus image fusion: A survey and a comparative study","volume":"44","author":"Zhang","year":"2021","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_51","first-page":"1","article-title":"A review of multimodal medical image fusion techniques","volume":"2020","author":"Huang","year":"2020","journal-title":"Comput. Math. Methods Med."},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1016\/j.inffus.2011.08.002","article-title":"A new image fusion performance metric based on visual information fidelity","volume":"14","author":"Han","year":"2013","journal-title":"Inf. Fusion"},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"177","DOI":"10.1016\/j.inffus.2005.04.003","article-title":"A new metric based on extended spatial frequency and its application to DWT based fusion algorithms","volume":"8","author":"Zheng","year":"2007","journal-title":"Inf. Fusion"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/24\/11\/3545\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T14:51:25Z","timestamp":1760107885000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/24\/11\/3545"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,30]]},"references-count":53,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2024,6]]}},"alternative-id":["s24113545"],"URL":"https:\/\/doi.org\/10.3390\/s24113545","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,30]]}}}