{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,12]],"date-time":"2026-01-12T17:57:04Z","timestamp":1768240624562,"version":"3.49.0"},"reference-count":60,"publisher":"Association for Computing Machinery (ACM)","issue":"1","funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62020106012, 62332008, 62106089, 62336004, 62202205, and 62576152"],"award-info":[{"award-number":["62020106012, 62332008, 62106089, 62336004, 62202205, and 62576152"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"crossref","award":["JUSRP202504007"],"award-info":[{"award-number":["JUSRP202504007"]}],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Basic Research Program of Jiangsu","award":["BK20250104"],"award-info":[{"award-number":["BK20250104"]}]},{"name":"111 Project of Ministry of Education of China","award":["B12018"],"award-info":[{"award-number":["B12018"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2026,1,31]]},"abstract":"<jats:p>\n                    In multi-modal imaging scenarios, the misalignment of images presents a persistent challenge. Conventional image fusion algorithms, aiming to enhance the performance of downstream vision tasks, presuppose strictly registered inputs to achieve satisfactory results. To relax this assumption, a common approach is to register the images first; however, existing multi-modal registration methods are often hindered by complex architectures and a heavy reliance on semantic information. This article proposes BusRef, a unified framework that jointly addresses image registration and fusion, with a specific focus on the Infrared-Visible Image Registration and Fusion (IVRF) task. Within this framework, unaligned image pairs are processed through three sequential stages: coarse registration, fine registration, and fusion. We demonstrate that this integrated approach enables more robust and accurate IVRF. Key to our framework is a novel training and evaluation strategy that employs masks to mitigate the influence of non-reconstructible regions on the loss function, thereby significantly improving the model\u2019s accuracy and robustness. Furthermore, we introduce a gradient-aware fusion network designed to effectively preserve complementary information from both modalities. Comprehensive experiments demonstrate that BusRef achieves superior performance when compared against various state-of-the-art registration and fusion algorithms. Our code is available at\n                    <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/Yukarizz\/BusReF\">https:\/\/github.com\/Yukarizz\/BusReF<\/jats:ext-link>\n                    .\n                  <\/jats:p>","DOI":"10.1145\/3773769","type":"journal-article","created":{"date-parts":[[2025,10,30]],"date-time":"2025-10-30T14:55:10Z","timestamp":1761836110000},"page":"1-19","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["BusReF: Infrared-Visible Images Registration and Fusion Focus on Reconstructible Area Using One Set of Features"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1834-0559","authenticated-orcid":false,"given":"Zeyang","family":"Zhang","sequence":"first","affiliation":[{"name":"Jiangnan University, Wuxi, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4550-7879","authenticated-orcid":false,"given":"Hui","family":"Li","sequence":"additional","affiliation":[{"name":"Jiangnan University, Wuxi, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9015-3128","authenticated-orcid":false,"given":"Tianyang","family":"Xu","sequence":"additional","affiliation":[{"name":"Jiangnan University, Wuxi, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0310-5778","authenticated-orcid":false,"given":"Xiaojun","family":"Wu","sequence":"additional","affiliation":[{"name":"Jiangnan University, Wuxi, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-4687-1487","authenticated-orcid":false,"given":"Congcong","family":"Bian","sequence":"additional","affiliation":[{"name":"Jiangnan University, Wuxi, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8110-9205","authenticated-orcid":false,"given":"Josef","family":"Kittler","sequence":"additional","affiliation":[{"name":"Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford, United Kingdom of Great Britain and Northern Ireland"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2026,1,12]]},"reference":[{"key":"e_1_3_1_2_2","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Arar Moab","year":"2020","unstructured":"Moab Arar, Yiftach Ginger, Dov Danon, Amit H. Bermano, and Daniel Cohen-Or. 2020. Unsupervised multi-modal image registration via geometry preserving image-to-image translation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2011.5995637"},{"key":"e_1_3_1_4_2","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Chen Yinpeng","year":"2020","unstructured":"Yinpeng Chen, Xiyang Dai, Mengchen Liu, Dongdong Chen, Lu Yuan, and Zicheng Liu. 2020. Dynamic convolution: Attention over convolution kernels. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_3_1_5_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TIM.2021.3109379","article-title":"UNIFusion: A lightweight unified image fusion network","volume":"70","author":"Cheng Chunyang","year":"2021","unstructured":"Chunyang Cheng, Xiao-Jun Wu, Tianyang Xu, and Guoyang Chen. 2021. UNIFusion: A lightweight unified image fusion network. IEEE Transactions on Instrumentation and Measurement 70 (2021), 1\u201314.","journal-title":"IEEE Transactions on Instrumentation and Measurement"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2022.11.010"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.bspc.2023.104740"},{"key":"e_1_3_1_8_2","unstructured":"Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv:2010.11929. Retrieved from https:\/\/arxiv.org\/abs\/2010.11929"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/26.477498"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2017.8206396"},{"key":"e_1_3_1_11_2","first-page":"1","volume-title":"Proceedings of the 2014 IEEE 8th International Conference on Application of Information and Communication Technologies (AICT)","author":"Haghighat Mohammad","year":"2014","unstructured":"Mohammad Haghighat and Masoud Amir Kabiri Razian. 2014. Fast-FMI: Non-reference image fusion metric. In Proceedings of the 2014 IEEE 8th International Conference on Application of Information and Communication Technologies (AICT), 1\u20133."},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/42.925292"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1088\/0031-9155\/46\/3\/201"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19797-0_31"},{"key":"e_1_3_1_15_2","volume-title":"Advances in Neural Information Processing Systems","volume":"28","author":"Jaderberg Max","year":"2015","unstructured":"Max Jaderberg, Karen Simonyan, Andrew Zisserman, and Koray Kavukcuoglu. 2015. Spatial transformer networks. In Advances in Neural Information Processing Systems. C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett (Eds.), Vol. 28, Curran Associates, Inc."},{"key":"e_1_3_1_16_2","first-page":"3496","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV) Workshops","author":"Jia Xinyu","year":"2021","unstructured":"Xinyu Jia, Chuang Zhu, Minzhen Li, Wenqi Tang, and Wenli Zhou. 2021. LLVIP: A visible-infrared paired dataset for low-light vision. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV) Workshops, 3496\u20133504."},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2024.121291"},{"key":"e_1_3_1_18_2","first-page":"9853","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Kong Lingke","year":"2023","unstructured":"Lingke Kong, X. Sharon Qi, Qijin Shen, Jiacheng Wang, Jingyi Zhang, Yanle Hu, and Qichao Zhou. 2023. Indescribable multi-modal spatial evaluator. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 9853\u20139862."},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2018.2887342"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIM.2020.3005230"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2020.2975984"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2021.02.023"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2023.3268209"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2023.02.011"},{"key":"e_1_3_1_25_2","first-page":"17346","volume-title":"Advances in Neural Information Processing Systems","author":"Li Xinghui","year":"2020","unstructured":"Xinghui Li, Kai Han, Shuda Li, and Victor Prisacariu. 2020. Dual-resolution correspondence networks. In Advances in Neural Information Processing Systems. H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33, Curran Associates, Inc., 17346\u201317357."},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3574136"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2021.3056725"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-023-01952-1"},{"issue":"4","key":"e_1_3_1_29_2","first-page":"2349","article-title":"Infrared and visible image fusion: From data compatibility to task adaption","volume":"47","author":"Liu Jinyuan","year":"2024","unstructured":"Jinyuan Liu, Guanyao Wu, Zhu Liu, Di Wang, Zhiying Jiang, Long Ma, Wei Zhong, and Xin Fan. 2024. Infrared and visible image fusion: From data compatibility to task adaption. IEEE Transactions on Pattern Analysis and Machine Intelligence 47, 4 (2024), 2349\u20132369.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/3665893"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1023\/B:VISI.0000029664.99615.94"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jvcir.2017.02.006"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/JAS.2022.105686"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIM.2021.3065426"},{"key":"e_1_3_1_35_2","first-page":"18433","volume-title":"Advances in Neural Information Processing Systems","author":"Pielawski Nicolas","year":"2020","unstructured":"Nicolas Pielawski, Elisabeth Wetzer, Johan \u00d6fverstedt, Jiahao Lu, Carolina W\u00e4hlby, Joakim Lindblad, and Natasa Sladoje. 2020. CoMIR: Contrastive multimodal image representation for registration. In Advances in Neural Information Processing Systems. H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33, Curran Associates, Inc., 18433\u201318444."},{"key":"e_1_3_1_36_2","doi-asserted-by":"crossref","first-page":"605","DOI":"10.1007\/978-3-030-58545-7_35","volume-title":"Computer Vision \u2013 ECCV 2020","author":"Rocco Ignacio","year":"2020","unstructured":"Ignacio Rocco, Relja Arandjelovi\u0107, and Josef Sivic. 2020. Efficient neighbourhood consensus networks via submanifold sparse convolutions. In Computer Vision \u2013 ECCV 2020. Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.), Springer International Publishing, Cham, 605\u2013621."},{"key":"e_1_3_1_37_2","volume-title":"Advances in Neural Information Processing Systems","author":"Rocco Ignacio","year":"2018","unstructured":"Ignacio Rocco, Mircea Cimpoi, Relja Arandjelovi\u0107, Akihiko Torii, Tomas Pajdla, and Josef Sivic. 2018. Neighbourhood consensus networks. In Advances in Neural Information Processing Systems. S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31. Curran Associates, Inc."},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11760-012-0361-x"},{"key":"e_1_3_1_39_2","first-page":"8922","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Sun Jiaming","year":"2021","unstructured":"Jiaming Sun, Zehong Shen, Yuang Wang, Hujun Bao, and Xiaowei Zhou. 2021. LoFTR: Detector-free local feature matching with transformers. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 8922\u20138931."},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/JAS.2022.106082"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2022.03.007"},{"key":"e_1_3_1_42_2","doi-asserted-by":"crossref","first-page":"4531","DOI":"10.1109\/TMM.2025.3535291","article-title":"Robust one-stop multi-modality image registration-fusion-segmentation framework against misalignments and adversarial attacks","author":"Wang Di","year":"2025","unstructured":"Di Wang, Xianghao Jiao, Jinyuan Liu, and Xin Fan. 2025. Robust one-stop multi-modality image registration-fusion-segmentation framework against misalignments and adversarial attacks. IEEE Transactions on Multimedia 27 (2025), 4531\u20134543.","journal-title":"IEEE Transactions on Multimedia"},{"key":"e_1_3_1_43_2","doi-asserted-by":"crossref","unstructured":"Di Wang Jinyuan Liu Xin Fan and Risheng Liu. 2022. Unsupervised misaligned infrared and visible image fusion via cross-modality image generation and registration. arXiv:2205.11876. Retrieved from https:\/\/arxiv.org\/abs\/2205.11876","DOI":"10.24963\/ijcai.2022\/487"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2023.101828"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2024.3412743"},{"key":"e_1_3_1_46_2","first-page":"2746","volume-title":"Proceedings of the Asian Conference on Computer Vision (ACCV)","author":"Wang Qing","year":"2022","unstructured":"Qing Wang, Jiaming Zhang, Kailun Yang, Kunyu Peng, and Rainer Stiefelhagen. 2022. MatchFormer: Interleaving attention in transformers for feature matching. In Proceedings of the Asian Conference on Computer Vision (ACCV), 2746\u20132762."},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2003.819861"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/3531016"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2023.101835"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2020.3012548"},{"key":"e_1_3_1_51_2","first-page":"19679","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Xu Han","year":"2022","unstructured":"Han Xu, Jiayi Ma, Jiteng Yuan, Zhuliang Le, and Wei Liu. 2022. RFNet: Unsupervised network for mutually reinforcing multi-modal image registration and fusion. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 19679\u201319688."},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2023.3283682"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2021.06.008"},{"key":"e_1_3_1_54_2","volume-title":"Proceedings of the IEEE International Conference on Computer Vision (ICCV)","author":"Zhang Ting","year":"2017","unstructured":"Ting Zhang, Guo-Jun Qi, Bin Xiao, and Jingdong Wang. 2017. Interleaved group convolutions. In Proceedings of the IEEE International Conference on Computer Vision (ICCV)."},{"key":"e_1_3_1_55_2","first-page":"II","volume-title":"Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings","volume":"2","author":"Zhao Feng","year":"2006","unstructured":"Feng Zhao, Qingming Huang, and Wen Gao. 2006. Image matching by normalized cross-correlation. In Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, Vol. 2, II."},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1145\/3707462"},{"key":"e_1_3_1_57_2","first-page":"5906","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Zhao Zixiang","year":"2023","unstructured":"Zixiang Zhao, Haowen Bai, Jiangshe Zhang, Yulun Zhang, Shuang Xu, Zudi Lin, Radu Timofte, and Luc Van Gool. 2023. CDDFuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 5906\u20135916."},{"key":"e_1_3_1_58_2","doi-asserted-by":"crossref","unstructured":"Zixiang Zhao Shuang Xu Chunxia Zhang Junmin Liu Pengfei Li and Jiangshe Zhang. 2020. DIDFuse: Deep image decomposition for infrared and visible image fusion. arXiv:2003.09210. Retrieved from https:\/\/arxiv.org\/abs\/2003.09210","DOI":"10.24963\/ijcai.2020\/135"},{"key":"e_1_3_1_59_2","doi-asserted-by":"publisher","DOI":"10.1145\/3616495"},{"key":"e_1_3_1_60_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2022.10.017"},{"key":"e_1_3_1_61_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.244"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3773769","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,12]],"date-time":"2026-01-12T14:30:02Z","timestamp":1768228202000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3773769"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1,12]]},"references-count":60,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,1,31]]}},"alternative-id":["10.1145\/3773769"],"URL":"https:\/\/doi.org\/10.1145\/3773769","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,1,12]]},"assertion":[{"value":"2025-03-12","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-10-21","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-01-12","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}