{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,10]],"date-time":"2026-03-10T23:58:53Z","timestamp":1773187133140,"version":"3.50.1"},"reference-count":48,"publisher":"MDPI AG","issue":"18","license":[{"start":{"date-parts":[[2022,9,11]],"date-time":"2022-09-11T00:00:00Z","timestamp":1662854400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Remote sensing image scene classification takes image blocks as classification units and predicts their semantic descriptors. Because it is difficult to obtain enough labeled samples for all classes of remote sensing image scenes, zero-shot classification methods which can recognize image scenes that are not seen in the training stage are of great significance. By projecting the image visual features and the class semantic features into the latent space and ensuring their alignment, the variational autoencoder (VAE) generative model has been applied to address remote-sensing image scene classification under a zero-shot setting. However, the VAE model takes the element-wise square error as the reconstruction loss, which may not be suitable for measuring the reconstruction quality of the visual and semantic features. Therefore, this paper proposes to augment the VAE models with the generative adversarial network (GAN) to make use of the GAN\u2019s discriminator in order to learn a suitable reconstruction quality metric for VAE. To promote feature alignment in the latent space, we have also proposed cross-modal feature-matching loss to make sure that the visual features of one class are aligned with the semantic features of the class and not those of other classes. Based on a public dataset, our experiments have shown the effects of the proposed improvements. Moreover, taking the ResNet models of ResNet18, extracting 512-dimensional visual features, and ResNet50 and ResNet101, both extracting 2048-dimensional visual features for testing, the impact of the different visual feature extractors has also been investigated. The experimental results show that better performance is achieved by ResNet18. This indicates that more layers of the extractors and larger dimensions of the extracted features may not contribute to the image scene classification under a zero-shot setting.<\/jats:p>","DOI":"10.3390\/rs14184533","type":"journal-article","created":{"date-parts":[[2022,9,13]],"date-time":"2022-09-13T04:05:41Z","timestamp":1663041941000},"page":"4533","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["Integrating Adversarial Generative Network with Variational Autoencoders towards Cross-Modal Alignment for Zero-Shot Remote Sensing Image Scene Classification"],"prefix":"10.3390","volume":"14","author":[{"given":"Suqiang","family":"Ma","sequence":"first","affiliation":[{"name":"The School of Computer and Information Engineering, Henan University, Kaifeng 475000, China"}]},{"given":"Chun","family":"Liu","sequence":"additional","affiliation":[{"name":"The School of Computer and Information Engineering, Henan University, Kaifeng 475000, China"},{"name":"Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng 475004, China"},{"name":"Henan Engineering Laboratory of Spatial Information Processing, Henan University, Kaifeng 475004, China"},{"name":"Henan Industrial Technology Academy of Spatio-Temporal Big Data, Henan University, Zhengzhou 450046, China"}]},{"given":"Zheng","family":"Li","sequence":"additional","affiliation":[{"name":"The School of Computer and Information Engineering, Henan University, Kaifeng 475000, China"},{"name":"Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng 475004, China"},{"name":"Henan Engineering Laboratory of Spatial Information Processing, Henan University, Kaifeng 475004, China"}]},{"given":"Wei","family":"Yang","sequence":"additional","affiliation":[{"name":"The School of Computer and Information Engineering, Henan University, Kaifeng 475000, China"},{"name":"Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng 475004, China"},{"name":"Henan Engineering Laboratory of Spatial Information Processing, Henan University, Kaifeng 475004, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,9,11]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"0428001","DOI":"10.3788\/AOS201636.0428001","article-title":"High spatial resolution remote sensing image classification based on deep learning","volume":"36","author":"Liu","year":"2016","journal-title":"Acta Opt. Sin."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1947","DOI":"10.1109\/TGRS.2014.2351395","article-title":"Pyramid of spatial relatons for scene-level land use classification","volume":"53","author":"Chen","year":"2014","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1080\/01431161.2012.705443","article-title":"Automatic landslide detection from remote-sensing imagery using a scene classification method based on BoVW and pLSA","volume":"34","author":"Cheng","year":"2013","journal-title":"Int. J. Remote Sens."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"231","DOI":"10.1016\/j.rse.2018.05.006","article-title":"Integrating bottom-up classification and top-down feedback for improving urban land-cover and functional-zone mapping","volume":"212","author":"Zhang","year":"2018","journal-title":"Remote Sens. Environ."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"14680","DOI":"10.3390\/rs71114680","article-title":"Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery","volume":"7","author":"Hu","year":"2015","journal-title":"Remote Sens."},{"key":"ref_6","unstructured":"Castelluccio, M., Poggi, G., Sansone, C., and Verdoliva, L. (2015). Land use classification in remote sensing images by convolutional neural networks. arXiv."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Penatti, O.A., Nogueira, K., and Dos Santos, J.A. (2015, January 7\u201312). Do deep features generalize from everyday objects to remote sensing and aerial scenes domains?. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Boston, MA, USA.","DOI":"10.1109\/CVPRW.2015.7301382"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"4157","DOI":"10.1109\/TGRS.2017.2689071","article-title":"Zero-shot scene classification for high spatial resolution remote sensing images","volume":"55","author":"Li","year":"2017","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1345","DOI":"10.1109\/TKDE.2009.191","article-title":"A survey on transfer learning","volume":"22","author":"Pan","year":"2009","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Chen, L., Zhang, H., Xiao, J., Liu, W., and Chang, S.F. (2018, January 18\u201323). Zero-shot visual recognition using semantics-preserving adversarial embedding networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00115"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Das, D., and Lee, C.G. (2019, January 14\u201319). Zero-shot image recognition using relational matching. adaptation and calibration. Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary.","DOI":"10.1109\/IJCNN.2019.8852315"},{"key":"ref_12","first-page":"135","article-title":"Improving zero shot learning by mitigating the hubness problem","volume":"9284","author":"Ding","year":"2014","journal-title":"Comput. Sci."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"2332","DOI":"10.1109\/TPAMI.2015.2408354","article-title":"Transductive multi-view zero-shot learning","volume":"37","author":"Fu","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Huang, H., Wang, C., Yu, P.S., and Wang, C.D. (2019, January 16\u201317). Generative dual adversarial network for generalized zero-shot learning. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00089"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Schonfeld, E., Ebrahimi, S., Sinha, S., Darrell, T., and Akata, Z. (2019, January 16\u201317). Generalized zero-and few-shot learning via aligned variational autoencoders. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00844"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"10590","DOI":"10.1109\/TGRS.2020.3047447","article-title":"Learning deep cross-modal embedding networks for zero-shot remote sensing image scene classification","volume":"59","author":"Li","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"2811","DOI":"10.1109\/TGRS.2017.2783902","article-title":"When Deep Learning Meets Metric Learning: Remote Sensing Image Scene Classification via Learning Discriminative CNNs","volume":"56","author":"Cheng","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Zhao, H., Sun, X., Gao, F., and Dong, J. (2022). Pair-Wise Similarity Knowledge Distillation for RSI Scene Classification. Remote Sens., 14.","DOI":"10.3390\/rs14102483"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1016\/j.isprsjprs.2021.08.001","article-title":"Robust deep alignment network with remote sensing knowledge graph for zero-shot and generalized zero-shot remote sensing image scene classification","volume":"179","author":"Li","year":"2021","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_20","unstructured":"Larsen, A.B.L., S\u00f8nderby, S.K., Larochelle, H., and Winther, O. (2016, January 19\u201324). Autoencoding beyond pixels using a learned similarity metric. Proceedings of the International Conference on Machine Learning, New York, NY, USA."},{"key":"ref_21","unstructured":"Kingma, D.P., and Welling, M. (2014). Auto-encoding variational bayes. arXiv."},{"key":"ref_22","first-page":"139","article-title":"Generative adversarial nets","volume":"27","author":"Goodfellow","year":"2014","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Kodirov, E., Xiang, T., and Gong, S. (2017, January 21\u201326). Semantic autoencoder for zero-shot learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.473"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Speer, R., and Havasi, C. (2013). ConceptNet 5: A large semantic network for relational knowledge. The People\u2019s Web Meets NLP, Springer.","DOI":"10.1007\/978-3-642-35085-6_6"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Lampert, C.H., Nickisch, H., and Harmeling, S. (2009, January 20\u201325). Learning to detect unseen object classes by between-class attribute transfer. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPRW.2009.5206594"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Romera-Paredes, B., and Torr, P. (2015, January 6\u201311). An embarrassingly simple approach to zero-shot learning. Proceedings of the International Conference on Machine Learning, Lille, France.","DOI":"10.1007\/978-3-319-50077-5_2"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Xian, Y., Akata, Z., Sharma, G., Nguyen, Q., Hein, M., and Schiele, B. (2016, January 27\u201330). Latent embeddings for zero-shot classification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.15"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Zhang, Z., and Saligrama, V. (2015, January 11\u201318). Zero-shot learning via semantic similarity embedding. Proceedings of the IEEE International Conference on Computer Vision, Araucano Park, Chile.","DOI":"10.1109\/ICCV.2015.474"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Zhang, L., Xiang, T., and Gong, S. (2017, January 21\u201326). Learning a deep embedding model for zero-shot learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.321"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Xian, Y., Lorenz, T., Schiele, B., and Akata, Z. (2018, January 18\u201323). Feature generating networks for zero-shot learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00581"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"107352","DOI":"10.1016\/j.asoc.2021.107352","article-title":"Dual VAEGAN: A generative model for generalized zero-shot learning","volume":"107","author":"Luo","year":"2021","journal-title":"Appl. Soft Comput."},{"key":"ref_32","first-page":"100278","article-title":"Zero-shot image classification using coupled dictionary embedding","volume":"8","author":"Rostami","year":"2022","journal-title":"Mach. Learn. Appl."},{"key":"ref_33","unstructured":"Liu, Y., Gao, X., and Han, J. (2022). A Discriminative Cross-Aligned Variational Autoencoder for Zero-Shot Learning. IEEE Trans. Cybern., 1\u201312."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"0610002","DOI":"10.3788\/AOS201939.0610002","article-title":"Image Feature Fusion Based Remote Sensing Scene Zero-Shot Classification Algorithm","volume":"39","author":"Chen","year":"2019","journal-title":"Acta Opt. Sin."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Quan, J., Wu, C., Wang, H., and Wang, Z. (2018, January 10\u201312). Structural alignment based zero-shot classification for remote sensing scenes. Proceedings of the 2018 IEEE International Conference on Electronics and Communication Engineering (ICECE), Xi\u2019an, China.","DOI":"10.1109\/ICECOME.2018.8645056"},{"key":"ref_36","first-page":"286","article-title":"Word Vectors Fusion Based Remote Sensing Scenes Zero-shot Classification Algorithm","volume":"46","author":"Chen","year":"2019","journal-title":"Comput. Sci."},{"key":"ref_37","first-page":"1564","article-title":"Zero-shot remote sensing image scene classification based on robust cross-domain mapping and gradual refinement of semantic space","volume":"49","author":"Li","year":"2020","journal-title":"Acta Geod. Cartogr. Sin."},{"key":"ref_38","unstructured":"Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv."},{"key":"ref_39","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Yang, Y., and Newsam, S. (2010, January 2\u20135). Bag-of-visual-words and spatial extensions for land-use classification. Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, CA, USA.","DOI":"10.1145\/1869790.1869829"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"3965","DOI":"10.1109\/TGRS.2017.2685945","article-title":"AID: A benchmark data set for performance evaluation of aerial scene classification","volume":"55","author":"Xia","year":"2017","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"1865","DOI":"10.1109\/JPROC.2017.2675998","article-title":"Remote sensing image scene classification: Benchmark and state of the art","volume":"105","author":"Cheng","year":"2017","journal-title":"Proc. IEEE"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Li, H., Dou, X., Tao, C., Wu, Z., Chen, J., Peng, J., Deng, M., and Zhao, L. (2020). RSI-CB: A large-scale remote sensing image classification benchmark using crowdsourced data. Sensors, 20.","DOI":"10.3390\/s20061594"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1016\/j.isprsjprs.2018.01.004","article-title":"PatternNet: A benchmark dataset for performance evaluation of remote sensing image retrieval","volume":"145","author":"Zhou","year":"2018","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Tao, S.Y., Yeh, Y.R., and Wang, Y. (2017, January 4\u20137). Semantics-Preserving Locality Embedding for Zero-Shot Learning. Proceedings of the British Machine Vision Conference, London, UK.","DOI":"10.5244\/C.31.3"},{"key":"ref_47","unstructured":"Elhoseiny, M., and Elfeki, M. (November, January 27). Creativity Inspired Zero-Shot Learning. Proceedings of the International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_48","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Hinton","year":"2008","journal-title":"J. Mach. Learn. Res."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/18\/4533\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:29:22Z","timestamp":1760142562000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/18\/4533"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,11]]},"references-count":48,"journal-issue":{"issue":"18","published-online":{"date-parts":[[2022,9]]}},"alternative-id":["rs14184533"],"URL":"https:\/\/doi.org\/10.3390\/rs14184533","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,9,11]]}}}