{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,10]],"date-time":"2026-01-10T19:35:56Z","timestamp":1768073756247,"version":"3.49.0"},"reference-count":29,"publisher":"MDPI AG","issue":"16","license":[{"start":{"date-parts":[[2024,8,7]],"date-time":"2024-08-07T00:00:00Z","timestamp":1722988800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Small UAV target detection and tracking based on cross-modality image fusion have gained widespread attention. Due to the limited feature information available from small UAVs in images, where they occupy a minimal number of pixels, the precision required for detection and tracking algorithms is particularly high in complex backgrounds. Image fusion techniques can enrich the detailed information for small UAVs, showing significant advantages under extreme lighting conditions. Image registration is a fundamental step preceding image fusion. It is essential to achieve accurate image alignment before proceeding with image fusion to prevent severe ghosting and artifacts. This paper specifically focused on the alignment of small UAV targets within infrared and visible light imagery. To address this issue, this paper proposed a cross-modality image registration network based on deep learning, which includes a structure preservation and style transformation network (SPSTN) and a multi-level cross-attention residual registration network (MCARN). Firstly, the SPSTN is employed for modality transformation, transferring the cross-modality task into a single-modality task to reduce the information discrepancy between modalities. Then, the MCARN is utilized for single-modality image registration, capable of deeply extracting and fusing features from pseudo infrared and visible images to achieve efficient registration. To validate the effectiveness of the proposed method, comprehensive experimental evaluations were conducted on the Anti-UAV dataset. The extensive evaluation results validate the superiority and universality of the cross-modality image registration framework proposed in this paper, which plays a crucial role in subsequent image fusion tasks for more effective target detection.<\/jats:p>","DOI":"10.3390\/rs16162880","type":"journal-article","created":{"date-parts":[[2024,8,7]],"date-time":"2024-08-07T11:33:52Z","timestamp":1723030432000},"page":"2880","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["A Multi-Level Cross-Attention Image Registration Method for Visible and Infrared Small Unmanned Aerial Vehicle Targets via Image Style Transfer"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5793-4363","authenticated-orcid":false,"given":"Wen","family":"Jiang","sequence":"first","affiliation":[{"name":"Radar Monitoring Technology Laboratory, School of Information Science and Technology, North China University of Technology, Beijing 100144, China"}]},{"given":"Hanxin","family":"Pan","sequence":"additional","affiliation":[{"name":"Radar Monitoring Technology Laboratory, School of Information Science and Technology, North China University of Technology, Beijing 100144, China"}]},{"given":"Yanping","family":"Wang","sequence":"additional","affiliation":[{"name":"Radar Monitoring Technology Laboratory, School of Information Science and Technology, North China University of Technology, Beijing 100144, China"}]},{"given":"Yang","family":"Li","sequence":"additional","affiliation":[{"name":"Radar Monitoring Technology Laboratory, School of Information Science and Technology, North China University of Technology, Beijing 100144, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3020-5715","authenticated-orcid":false,"given":"Yun","family":"Lin","sequence":"additional","affiliation":[{"name":"Radar Monitoring Technology Laboratory, School of Information Science and Technology, North China University of Technology, Beijing 100144, China"}]},{"given":"Fukun","family":"Bi","sequence":"additional","affiliation":[{"name":"Radar Monitoring Technology Laboratory, School of Information Science and Technology, North China University of Technology, Beijing 100144, China"}]}],"member":"1968","published-online":{"date-parts":[[2024,8,7]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"977","DOI":"10.1016\/S0262-8856(03)00137-9","article-title":"Image registration methods: A survey","volume":"21","author":"Flusser","year":"2003","journal-title":"Image Vis. Comput."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Li, N., Li, Y., and Jiao, J. (2024). Multimodal remote sensing image registration based on adaptive multi-scale PIIFD. Multimed. Tools Appl., 1\u201313.","DOI":"10.1007\/s11042-024-18756-1"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1023\/B:VISI.0000029664.99615.94","article-title":"Distinctive image features from scale-invariant keypoints","volume":"60","author":"Lowe","year":"2004","journal-title":"Int. J. Comput. Vis."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"3296","DOI":"10.1109\/TIP.2019.2959244","article-title":"RIFT: Multi-modal image matching based on radiation-invariant feature transform","volume":"29","author":"Li","year":"2020","journal-title":"IEEE Trans. Image Process."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"3078","DOI":"10.1109\/TGRS.2018.2790483","article-title":"OS-SIFT: A Robust SIFT-like algorithm for high-resolution optical-to-SAR Image registration in suburban areas","volume":"56","author":"Xiang","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_6","first-page":"1","article-title":"Cross-Modality Image Matching Network with Modality-Invariant Feature Representation for Airborne-Ground Thermal Infrared and Visible Datasets","volume":"60","author":"Cui","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"591","DOI":"10.1109\/TIP.2022.3231135","article-title":"ReDFeat: Recoupling Detection and Description for Cross-Modal Feature Learning","volume":"32","author":"Deng","year":"2023","journal-title":"IEEE Trans. Image Process."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"108792","DOI":"10.1016\/j.patcog.2022.108792","article-title":"Learning attention-guided pyramidal features for few-shot fine-grained recognition","volume":"130","author":"Tang","year":"2022","journal-title":"Pattern Recognit."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Zhu, J.Y., Park, T., Isola, P., and Efros, A.A. (2017, January 22\u201329). Unpaired image-to-image translation using cycle-consistent adversarial networks. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.244"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Luo, Y., Cha, H., Zuo, L., Cheng, P., and Zhao, Q. (2023). General cross-modality registration framework for visible and infrared UAV target image registration. Sci. Rep., 13.","DOI":"10.1038\/s41598-023-39863-3"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"12148","DOI":"10.1109\/TPAMI.2023.3283682","article-title":"MURF: Mutually Reinforcing Multi-modal Image Registration and Fusion","volume":"45","author":"Xu","year":"2023","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Wang, D., Liu, J., Fan, X., and Liu, R. (2022). Unsupervised Misaligned Infrared and Visible Image Fusion via Cross-Modality Image Generation and Registration. arXiv.","DOI":"10.24963\/ijcai.2022\/487"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1007\/s00138-020-01060-x","article-title":"Deep learning in medical image registration: A survey","volume":"31","author":"Haskins","year":"2020","journal-title":"Mach. Vis. Appl."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Balakrishnan, G., Zhao, A., Sabuncu, M.R., Guttag, J., and Dalca, A.V. (2018, January 18\u201322). An unsupervised learning model for deformable medical image registration. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00964"},{"key":"ref_15","first-page":"2672","article-title":"Generative adversarial nets","volume":"27","author":"Goodfellow","year":"2014","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Isola, P., Zhu, J.Y., Zhou, T., and Efros, A.A. (2017, January 21\u201326). Image-to-image translation with conditional adversarial networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.632"},{"key":"ref_17","unstructured":"Wolterink, J.M., Dinkla, A.M., Savenije, M.H., Seevinck, P.R., van den Berg, C.A., and I\u0161gum, I. (2017). Deep MR to CT synthesis using unpaired data. Simulation and Synthesis in Medical Imaging: Second International Workshop, SASHIMI 2017, Held in Conjunction with MICCAI 2017, Qu\u00e9bec City, QC, Canada, 10 September 2017, Springer International Publishing. Proceedings 2."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.media.2018.07.002","article-title":"Weakly-supervised convolutional neural networks for cross-modal image registration","volume":"49","author":"Hu","year":"2018","journal-title":"Med. Image Anal."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1016\/S0031-3203(98)00091-0","article-title":"An overlap invariant entropy measure of 3D medical image alignment","volume":"32","author":"Studholme","year":"1999","journal-title":"Pattern Recognit."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1109\/42.563664","article-title":"Cross-modality image registration by maximization of mutual information","volume":"16","author":"Maes","year":"1997","journal-title":"IEEE Trans. Med. Imaging"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"120","DOI":"10.1109\/TMI.2003.809072","article-title":"PET-CT image registration in the chest using free-form deformations","volume":"22","author":"Mattes","year":"2003","journal-title":"IEEE Trans. Med. Imaging"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1016\/S1361-8415(01)80004-9","article-title":"Multi-modal volume registration by maximization of mutual information","volume":"1","author":"Wells","year":"1996","journal-title":"Med. Image Anal."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1882","DOI":"10.1109\/TMI.2010.2053043","article-title":"Intensity-based image registration by minimizing residual complexity","volume":"29","author":"Myronenko","year":"2010","journal-title":"IEEE Trans. Med. Imaging"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"712","DOI":"10.1109\/42.796284","article-title":"Nonrigid registration using free-form deformations: Application to breast MR images","volume":"18","author":"Rueckert","year":"1999","journal-title":"IEEE Trans. Med. Imaging"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"986","DOI":"10.1109\/TMI.2003.815867","article-title":"Mutual-information-based registration of medical images: A survey","volume":"22","author":"Pluim","year":"2003","journal-title":"IEEE Trans. Med. Imaging"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1266","DOI":"10.1109\/83.506761","article-title":"An FFT-based technique for translation, rotation, and scale-invariant image registration","volume":"5","author":"Reddy","year":"1996","journal-title":"IEEE Trans. Image Process."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1016\/j.neucom.2020.07.066","article-title":"RegiNet: Gradient guided multispectral image registration using convolutional neural networks","volume":"415","author":"Wei","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Arar, M., Ginger, Y., Danon, D., Bermano, A.H., and Cohen-Or, D. (2020, January 14\u201319). Unsupervised multi-modal image registration via geometry preserving image-to-image translation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01342"},{"key":"ref_29","unstructured":"Katharopoulos, A., Vyas, A., Pappas, N., and Fleuret, F. (2020, January 13\u201318). Transformers are RNNs: Fast autoregressive transformers with linear attention. Proceedings of the International Conference on Machine Learning, Virtual Event."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/16\/2880\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T15:31:21Z","timestamp":1760110281000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/16\/2880"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8,7]]},"references-count":29,"journal-issue":{"issue":"16","published-online":{"date-parts":[[2024,8]]}},"alternative-id":["rs16162880"],"URL":"https:\/\/doi.org\/10.3390\/rs16162880","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,8,7]]}}}