{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,6]],"date-time":"2026-03-06T18:40:33Z","timestamp":1772822433473,"version":"3.50.1"},"reference-count":60,"publisher":"MDPI AG","issue":"16","license":[{"start":{"date-parts":[[2024,8,9]],"date-time":"2024-08-09T00:00:00Z","timestamp":1723161600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Natural Science Foundation of China","award":["61971426"],"award-info":[{"award-number":["61971426"]}]},{"name":"National Natural Science Foundation of China","award":["2024JJ6466"],"award-info":[{"award-number":["2024JJ6466"]}]},{"name":"National Natural Science Foundation of China","award":["GZC20233545"],"award-info":[{"award-number":["GZC20233545"]}]},{"name":"Natural Science Foundation of Hunan Province of China","award":["61971426"],"award-info":[{"award-number":["61971426"]}]},{"name":"Natural Science Foundation of Hunan Province of China","award":["2024JJ6466"],"award-info":[{"award-number":["2024JJ6466"]}]},{"name":"Natural Science Foundation of Hunan Province of China","award":["GZC20233545"],"award-info":[{"award-number":["GZC20233545"]}]},{"name":"Postdoctoral Fellowship Program of CPSF","award":["61971426"],"award-info":[{"award-number":["61971426"]}]},{"name":"Postdoctoral Fellowship Program of CPSF","award":["2024JJ6466"],"award-info":[{"award-number":["2024JJ6466"]}]},{"name":"Postdoctoral Fellowship Program of CPSF","award":["GZC20233545"],"award-info":[{"award-number":["GZC20233545"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Simulated data play an important role in SAR target recognition, particularly under zero-shot learning (ZSL) conditions caused by the lack of training samples. The traditional SAR simulation method is based on manually constructing target 3D models for electromagnetic simulation, which is costly and limited by the target\u2019s prior knowledge base. Also, the unavoidable discrepancy between simulated SAR and measured SAR makes the traditional simulation method more limited for target recognition. This paper proposes an innovative SAR simulation method based on a visual language model and generative diffusion model by extracting target semantic information from optical remote sensing images and transforming it into a 3D model for SAR simulation to address the challenge of SAR target recognition under ZSL conditions. Additionally, to reduce the domain shift between the simulated domain and the measured domain, we propose a domain adaptation method based on dynamic weight domain loss and classification loss. The effectiveness of semantic information-based 3D models has been validated on the MSTAR dataset and the feasibility of the proposed framework has been validated on the self-built civilian vehicle dataset. The experimental results demonstrate that the first proposed SAR simulation method based on a visual language model and generative diffusion model can effectively improve target recognition performance under ZSL conditions.<\/jats:p>","DOI":"10.3390\/rs16162927","type":"journal-article","created":{"date-parts":[[2024,8,12]],"date-time":"2024-08-12T08:54:08Z","timestamp":1723452848000},"page":"2927","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Leveraging Visual Language Model and Generative Diffusion Model for Zero-Shot SAR Target Recognition"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-9132-5270","authenticated-orcid":false,"given":"Junyu","family":"Wang","sequence":"first","affiliation":[{"name":"College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China"}]},{"given":"Hao","family":"Sun","sequence":"additional","affiliation":[{"name":"College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China"}]},{"given":"Tao","family":"Tang","sequence":"additional","affiliation":[{"name":"College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1828-0392","authenticated-orcid":false,"given":"Yuli","family":"Sun","sequence":"additional","affiliation":[{"name":"College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China"}]},{"given":"Qishan","family":"He","sequence":"additional","affiliation":[{"name":"College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China"}]},{"given":"Lin","family":"Lei","sequence":"additional","affiliation":[{"name":"College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China"}]},{"given":"Kefeng","family":"Ji","sequence":"additional","affiliation":[{"name":"College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China"}]}],"member":"1968","published-online":{"date-parts":[[2024,8,9]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Li, J., Yu, Z., Yu, L., Cheng, P., Chen, J., and Chi, C. (2023). A comprehensive survey on SAR ATR in deep-learning era. Remote Sens., 15.","DOI":"10.3390\/rs15051454"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"4806","DOI":"10.1109\/TGRS.2016.2551720","article-title":"Target classification using the deep convolutional networks for SAR images","volume":"54","author":"Chen","year":"2016","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"2245","DOI":"10.1109\/LGRS.2017.2758900","article-title":"Zero-shot learning of SAR target feature space with deep generative neural networks","volume":"14","author":"Song","year":"2017","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"164","DOI":"10.1016\/j.isprsjprs.2023.12.004","article-title":"Physics inspired hybrid attention for SAR target recognition","volume":"207","author":"Huang","year":"2024","journal-title":"Isprs J. Photogramm. Remote Sens."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"5214414","DOI":"10.1109\/TGRS.2023.3305094","article-title":"Simulation Aided SAR Target Classification via Dual Branch Reconstruction and Subdomain Alignment","volume":"61","author":"Lv","year":"2023","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1092","DOI":"10.1109\/LGRS.2019.2936897","article-title":"EM Simulation-Aided Zero-Shot Learning for SAR Automatic Target Recognition","volume":"17","author":"Song","year":"2019","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"2942","DOI":"10.1109\/JSTARS.2021.3059991","article-title":"Bridging a gap in SAR-ATR: Training on fully synthetic and testing on measured data","volume":"14","author":"Inkawhich","year":"2021","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_8","first-page":"168","article-title":"Simulation-assisted SAR target classification based on unsupervised domain adaptation and model interpretability analysis","volume":"11","author":"Lyu","year":"2022","journal-title":"J. Radars"},{"key":"ref_9","unstructured":"Zelnio, E., and Garber, F.D. (2019, January 18). A SAR dataset for ATR Development: The Synthetic and Measured Paired Labeled Experiment (SAMPLE). Proceedings of the Algorithms for Synthetic Aperture Radar Imagery XXVI, Baltimore, MD, USA."},{"key":"ref_10","first-page":"1","article-title":"Two-Stage Cross-Modality Transfer Learning Method for Military-Civilian SAR Ship Recognition","volume":"19","author":"Song","year":"2022","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_11","first-page":"1","article-title":"Unsupervised Domain Adaptation Based on Progressive Transfer for Ship Detection: From Optical to SAR Images","volume":"60","author":"Shi","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"9816","DOI":"10.1109\/JSTARS.2023.3324768","article-title":"Learning to Find the Optimal Correspondence Between SAR and Optical Image Patches","volume":"16","author":"Li","year":"2023","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"4611","DOI":"10.1109\/JSTARS.2024.3357171","article-title":"MGSFA-Net: Multiscale Global Scattering Feature Association Network for SAR Ship Target Recognition","volume":"17","author":"Zhang","year":"2024","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_14","first-page":"4051","article-title":"A Review of Generalized Zero-Shot Learning Methods","volume":"45","author":"Pourpanah","year":"2023","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_15","first-page":"1","article-title":"Learn to Recognize Unknown SAR Targets From Reflection Similarity","volume":"19","author":"Wei","year":"2022","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_16","first-page":"1","article-title":"Zero-shot SAR target recognition based on classification assistance","volume":"20","author":"Wei","year":"2023","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Ma, Y., Pei, J., Zhang, X., Huo, W., Zhang, Y., Huang, Y., and Yang, J. (2023, January 1\u20134). An Optical Image-Aided Approach for Zero-Shot SAR Image Scene Classification. Proceedings of the 2023 IEEE Radar Conference (RadarConf23), San Antonio, TX, USA.","DOI":"10.1109\/RadarConf2351548.2023.10149719"},{"key":"ref_18","unstructured":"Silva, J.D., Magalh\u00e3es, J., Tuia, D., and Martins, B. (2024). Large Language Models for Captioning and Retrieving Remote Sensing Images. arXiv."},{"key":"ref_19","first-page":"103672","article-title":"Exploring region features in remote sensing image captioning","volume":"127","author":"Zhao","year":"2024","journal-title":"Int. J. Appl. Earth Obs. Geoinf."},{"key":"ref_20","unstructured":"Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., and Clark, J. (2021, January 18\u201324). Learning transferable visual models from natural language supervision. Proceedings of the International Conference on Machine Learning, PMLR, Virtual."},{"key":"ref_21","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"5502705","DOI":"10.1109\/LGRS.2024.3360184","article-title":"LFSMIM: A Low-Frequency Spectral Masked Image Modeling Method for Hyperspectral Image Classification","volume":"21","author":"Chen","year":"2024","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Liu, F., Chen, D., Guan, Z., Zhou, X., Zhu, J., Ye, Q., Fu, L., and Zhou, J. (2023). RemoteCLIP: A Vision Language Foundation Model for Remote Sensing. arXiv.","DOI":"10.1109\/TGRS.2024.3390838"},{"key":"ref_24","unstructured":"Hu, Y., Yuan, J., Wen, C., Lu, X., and Li, X. (2023). RSGPT: A Remote Sensing Vision Language Model and Benchmark. arXiv."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Auer, S., Bamler, R., and Reinartz, P. (2016, January 10\u201315). RaySAR-3D SAR simulator: Now open source. Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China.","DOI":"10.1109\/IGARSS.2016.7730757"},{"key":"ref_26","unstructured":"Hammer, H., and Schulz, K. (September, January 31). Coherent simulation of SAR images. Proceedings of the Image and Signal Processing for Remote Sensing XV, Berlin, Germany."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"3519","DOI":"10.1109\/TGRS.2009.2022326","article-title":"Hybrid GPU-based single-and double-bounce SAR simulation","volume":"47","author":"Balz","year":"2009","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"\u00d8degaard, N., Knapskog, A.O., Cochin, C., and Louvigne, J.C. (2016, January 2\u20136). Classification of ships using real and simulated data in a convolutional neural network. Proceedings of the 2016 IEEE Radar Conference (RadarConf), Philadelphia, PA, USA.","DOI":"10.1109\/RADAR.2016.7485270"},{"key":"ref_29","unstructured":"Hong, Y., Zhang, K., Gu, J., Bi, S., Zhou, Y., Liu, D., Liu, F., Sunkavalli, K., Bui, T., and Tan, H. (2023). Lrm: Large reconstruction model for single image to 3d. arXiv."},{"key":"ref_30","unstructured":"Poole, B., Jain, A., Barron, J.T., and Mildenhall, B. (2022). Dreamfusion: Text-to-3d using 2d diffusion. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Lin, C.H., Gao, J., Tang, L., Takikawa, T., Zeng, X., Huang, X., Kreis, K., Fidler, S., Liu, M.Y., and Lin, T.Y. (2023, January 17\u201324). Magic3d: High-resolution text-to-3d content creation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.00037"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Liu, R., Wu, R., Van Hoorick, B., Tokmakov, P., Zakharov, S., and Vondrick, C. (2023, January 2\u20136). Zero-1-to-3: Zero-shot one image to 3d object. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Paris, France.","DOI":"10.1109\/ICCV51070.2023.00853"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Liu, M., Shi, R., Chen, L., Zhang, Z., Xu, C., Wei, X., Chen, H., Zeng, C., Gu, J., and Su, H. (2024, January 17\u201321). One-2-3-45++: Fast single image to 3d objects with consistent multi-view generation and 3d diffusion. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR52733.2024.00960"},{"key":"ref_34","unstructured":"Tochilkin, D., Pankratz, D., Liu, Z., Huang, Z., Letts, A., Li, Y., Liang, D., Laforte, C., Jampani, V., and Cao, Y.P. (2024). Triposr: Fast 3d object reconstruction from a single image. arXiv."},{"key":"ref_35","unstructured":"Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., and Sutskever, I. (2021). Zero-Shot Text-to-Image Generation. arXiv."},{"key":"ref_36","unstructured":"Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., and Chen, M. (2022). Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"2814","DOI":"10.1109\/TKDE.2024.3361474","article-title":"A survey on generative diffusion models","volume":"36","author":"Cao","year":"2024","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_38","unstructured":"Song, J., Meng, C., and Ermon, S. (2020). Denoising diffusion implicit models. arXiv."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. (2022, January 18\u201324). High-resolution image synthesis with latent diffusion models. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"2324","DOI":"10.1109\/TGRS.2019.2947634","article-title":"What, where, and how to transfer in SAR target recognition based on deep CNNs","volume":"58","author":"Huang","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"9842","DOI":"10.1109\/JSTARS.2022.3220875","article-title":"Domain adaptation in remote sensing image classification: A survey","volume":"15","author":"Peng","year":"2022","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Rostami, M., Kolouri, S., Eaton, E., and Kim, K. (2019). Deep transfer learning for few-shot SAR image classification. Remote Sens., 11.","DOI":"10.20944\/preprints201905.0030.v1"},{"key":"ref_43","first-page":"307","article-title":"Intelligent technology for aircraft detection and recognition through SAR imagery: Advancements and prospects","volume":"13","author":"Ru","year":"2023","journal-title":"J. Radars"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"153391","DOI":"10.1109\/ACCESS.2019.2948618","article-title":"SAR target recognition based on cross-domain and cross-task transfer learning","volume":"7","author":"Wang","year":"2019","journal-title":"IEEE Access"},{"key":"ref_45","unstructured":"Li, J., Li, D., Xiong, C., and Hoi, S. (2022, January 17\u201323). Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. Proceedings of the International conference on machine learning. PMLR, Baltimore, MD USA."},{"key":"ref_46","unstructured":"Turc, I., Chang, M.W., Lee, K., and Toutanova, K. (2019). Well-read students learn better: On the importance of pre-training compact models. arXiv."},{"key":"ref_47","unstructured":"Wikipedia Contributors (2024, July 26). T-72 Tank at CFB Borden\u2014Wikimedia Commons. Available online: https:\/\/commons.wikimedia.org\/wiki\/File:T72_cfb_borden_1.JPG."},{"key":"ref_48","unstructured":"Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W. (2021). Lora: Low-rank adaptation of large language models. arXiv."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Caron, M., Touvron, H., Misra, I., J\u00e9gou, H., Mairal, J., Bojanowski, P., and Joulin, A. (2021, January 11\u201317). Emerging properties in self-supervised vision transformers. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00951"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Chan, E.R., Lin, C.Z., Chan, M.A., Nagano, K., Pan, B., De Mello, S., Gallo, O., Guibas, L.J., Tremblay, J., and Khamis, S. (2022, January 18\u201324). Efficient geometry-aware 3d generative adversarial networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01565"},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"2597","DOI":"10.1109\/TAP.2012.2189717","article-title":"Fast 3D-ISAR image simulation of targets at arbitrary aspect angles through nonuniform fast Fourier transform (NUFFT)","volume":"60","author":"He","year":"2012","journal-title":"IEEE Trans. Antennas Propag."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Cui, S., Wang, S., Zhuo, J., Li, L., Huang, Q., and Tian, Q. (2020, January 13\u201319). Towards discriminability and diversity: Batch nuclear-norm maximization under label insufficient situations. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00400"},{"key":"ref_53","unstructured":"Sun, B., and Saenko, K. (2016). Deep coral: Correlation alignment for deep domain adaptation. Computer Vision\u2013ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8\u201310 and 15\u201316, 2016, Proceedings, Part III 14, Springer."},{"key":"ref_54","unstructured":"Ganin, Y., and Lempitsky, V. (2015, January 7\u20139). Unsupervised domain adaptation by backpropagation. Proceedings of the International Conference on Machine Learning, PMLR, Lille, France."},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"481","DOI":"10.1117\/12.321851","article-title":"Moving and stationary target acquisition and recognition (MSTAR) model-based automatic target recognition: Search technology for a robust ATR","volume":"Volume 3370","author":"Diemunsch","year":"1998","journal-title":"Proceedings of the Algorithms for Synthetic Aperture Radar Imagery V"},{"key":"ref_56","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Hinton","year":"2008","journal-title":"J. Mach. Learn. Res."},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., and Xie, S. (2022, January 18\u201324). A convnet for the 2020s. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01167"},{"key":"ref_58","unstructured":"Karen, S. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1145\/3065386","article-title":"ImageNet classification with deep convolutional neural networks","volume":"60","author":"Krizhevsky","year":"2017","journal-title":"Commun. ACM"},{"key":"ref_60","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/16\/2927\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T15:34:12Z","timestamp":1760110452000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/16\/2927"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8,9]]},"references-count":60,"journal-issue":{"issue":"16","published-online":{"date-parts":[[2024,8]]}},"alternative-id":["rs16162927"],"URL":"https:\/\/doi.org\/10.3390\/rs16162927","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,8,9]]}}}