{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,21]],"date-time":"2026-03-21T19:14:29Z","timestamp":1774120469960,"version":"3.50.1"},"reference-count":67,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2022,5,18]],"date-time":"2022-05-18T00:00:00Z","timestamp":1652832000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100016692","name":"Key Research and Development Program of Ningxia Hui Autonomous Region (Key Technologies for Intelligent Monitoring of Spatial Planning Based on High-Resolution Remote Sensing)","doi-asserted-by":"publisher","award":["2019BFG02009"],"award-info":[{"award-number":["2019BFG02009"]}],"id":[{"id":"10.13039\/100016692","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>The results of aerial scene classification can provide valuable information for urban planning and land monitoring. In this specific field, there are always a number of object-level semantic classes in big remote-sensing pictures. Complex label-space makes it hard to detect all the targets and perceive corresponding semantics in the typical scene, thereby weakening the sensing ability. Even worse, the preparation of a labeled dataset for the training of deep networks is more difficult due to multiple labels. In order to mine object-level visual features and make good use of label dependency, we propose a novel framework in this article, namely a Cross-Modal Representation Learning and Label Graph Mining-based Residual Multi-Attentional CNN-LSTM framework (CM-GM framework). In this framework, a residual multi-attentional convolutional neural network is developed to extract object-level image features. Moreover, semantic labels are embedded by language model and then form a label graph which can be further mapped by advanced graph convolutional networks (GCN). With these cross-modal feature representations (image, graph and text), object-level visual features will be enhanced and aligned to GCN-based label embeddings. After that, aligned visual signals are fed into a bi-LSTM subnetwork according to the built label graph. The CM-GM framework is able to map both visual features and graph-based label representations into a correlated space appropriately, using label dependency efficiently, thus improving the LSTM predictor\u2019s ability. Experimental results show that the proposed CM-GM framework is able to achieve higher accuracy on many multi-label benchmark datasets in remote sensing field.<\/jats:p>","DOI":"10.3390\/rs14102424","type":"journal-article","created":{"date-parts":[[2022,5,18]],"date-time":"2022-05-18T23:14:26Z","timestamp":1652915666000},"page":"2424","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":16,"title":["Cross-Modal Feature Representation Learning and Label Graph Mining in a Residual Multi-Attentional CNN-LSTM Network for Multi-Label Aerial Scene Classification"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5453-7389","authenticated-orcid":false,"given":"Peng","family":"Li","sequence":"first","affiliation":[{"name":"School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China"},{"name":"Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing 100083, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0519-169X","authenticated-orcid":false,"given":"Peng","family":"Chen","sequence":"additional","affiliation":[{"name":"Financial Technology Innovation Department, Postal Savings Bank of China, Beijing 100808, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3456-5259","authenticated-orcid":false,"given":"Dezheng","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China"},{"name":"Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing 100083, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,5,18]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"48","DOI":"10.1016\/j.isprsjprs.2019.04.017","article-title":"Social sensing from street-level imagery: A case study in learning spatio-temporal urban mobility patterns","volume":"153","author":"Zhang","year":"2019","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Ghoussein, Y., Nicolas, H., Haury, J., Fadel, A., Pichelin, P., Hamdan, H.A., and Faour, G. (2019). Multitemporal Remote Sensing Based on an FVC Reference Period Using Sentinel-2 for Monitoring Eichhornia crassipes Mediterranean River. Remote Sens., 11.","DOI":"10.3390\/rs11161856"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R.B. (2017, January 22\u201329). Mask R-CNN. Proceedings of the IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S.K., Girshick, R.B., and Farhadi, A. (2016, January 27\u201330). You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep Residual Learning for Image Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_6","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012, January 3\u20136). ImageNet Classification with Deep Convolutional Neural Networks. Proceedings of the 26th Annual Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"228","DOI":"10.1080\/2150704X.2017.1415474","article-title":"Aircraft detection in remote sensing images based on a deep residual network and Super-Vector coding","volume":"9","author":"Yang","year":"2018","journal-title":"Remote Sens. Lett."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"6899","DOI":"10.1109\/TGRS.2018.2845668","article-title":"Remote Sensing Scene Classification Using Multilayer Stacked Covariance Pooling","volume":"56","author":"He","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"7109","DOI":"10.1109\/TGRS.2018.2848473","article-title":"Scene Classification Based on Multiscale Convolutional Neural Network","volume":"56","author":"Liu","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Durand, T., Mehrasa, N., and Mori, G. (2019, January 16\u201320). Learning a Deep ConvNet for Multi-label Classification with Partial Labels. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00074"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1819","DOI":"10.1109\/TKDE.2013.39","article-title":"A Review on Multi-Label Learning Algorithms","volume":"26","author":"Zhang","year":"2014","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Elisseeff, A., and Weston, J. (2001, January 3\u20138). A kernel method for multi-labelled classification. Proceedings of the Advances in Neural Information Processing Systems 14, Neural Information Processing Systems: Natural and Synthetic, NIPS 2001, Vancouver, BC, Canada.","DOI":"10.7551\/mitpress\/1120.003.0092"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1897","DOI":"10.1016\/j.artint.2008.08.002","article-title":"Label ranking by learning pairwise preferences","volume":"172","author":"Cheng","year":"2008","journal-title":"Artif. Intell."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Wang, J., Yang, Y., Mao, J., Huang, Z., Huang, C., and Xu, W. (2016, January 27\u201330). CNN-RNN: A Unified Framework for Multi-label Image Classification. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.251"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Chen, Z., Wei, X., Wang, P., and Guo, Y. (2019, January 16\u201320). Multi-Label Image Recognition with Graph Convolutional Networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00532"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Yang, Y., and Newsam, S.D. (2010, January 3\u20135). Bag-of-visual-words and spatial extensions for land-use classification. Proceedings of the 18th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, ACM-GIS 2010, San Jose, CA, USA.","DOI":"10.1145\/1869790.1869829"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1144","DOI":"10.1109\/TGRS.2017.2760909","article-title":"Multilabel Remote Sensing Image Retrieval Using a Semisupervised Graph-Theoretic Method","volume":"56","author":"Chaudhuri","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_18","unstructured":"Duvenaud, D., Maclaurin, D., Aguilera-Iparraguirre, J., G\u00f3mez-Bombarelli, R., Hirzel, T., Aspuru-Guzik, A., and Adams, R.P. (2015, January 7\u201312). Convolutional Networks on Graphs for Learning Molecular Fingerprints. Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, Montreal, QC, Canada."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long Short-Term Memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Zheng, H., Fu, J., Mei, T., and Luo, J. (2017, January 22\u201329). Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition. Proceedings of the IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy.","DOI":"10.1109\/ICCV.2017.557"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Song, S., Xu, B., and Yang, J. (2016). SAR Target Recognition via Supervised Discriminative Dictionary Learning and Sparse Representation of the SAR-HOG Feature. Remote Sens., 8.","DOI":"10.3390\/rs8080683"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1007\/BF00130487","article-title":"Color indexing","volume":"7","author":"Swain","year":"1991","journal-title":"Int. J. Comput. Vis."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1023\/B:VISI.0000029664.99615.94","article-title":"Distinctive Image Features from Scale-Invariant Keypoints","volume":"60","author":"Lowe","year":"2004","journal-title":"Int. J. Comput. Vis."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1023\/A:1011139631724","article-title":"Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope","volume":"42","author":"Oliva","year":"2001","journal-title":"Int. J. Comput. Vis."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1109\/LGRS.2009.2023536","article-title":"Semantic Annotation of Satellite Images Using Latent Dirichlet Allocation","volume":"7","author":"Datcu","year":"2010","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_26","unstructured":"Li, F., and Perona, P. (2005, January 20\u201326). A Bayesian Hierarchical Model for Learning Natural Scene Categories. Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), San Diego, CA, USA."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Zhang, J., Zhang, J., Dai, T., and He, Z. (2019). Exploring Weighted Dual Graph Regularized Non-Negative Matrix Tri-Factorization Based Collaborative Filtering Framework for Multi-Label Annotation of Remote Sensing Images. Remote Sens., 11.","DOI":"10.3390\/rs11080922"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"1356","DOI":"10.1109\/TGRS.2013.2250978","article-title":"Semantic Annotation of Satellite Images Using Author-Genre-Topic Model","volume":"52","author":"Luo","year":"2014","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"2038","DOI":"10.1016\/j.patcog.2006.12.019","article-title":"ML-KNN: A lazy learning approach to multi-label learning","volume":"40","author":"Zhang","year":"2007","journal-title":"Pattern Recognit."},{"key":"ref_30","unstructured":"Tieu, K., and Viola, P.A. (2000, January 13\u201315). Boosting Image Retrieval. Proceedings of the 2000 Conference on Computer Vision and Pattern Recognition (CVPR 2000), Hilton Head, SC, USA."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1285","DOI":"10.1109\/LGRS.2012.2237502","article-title":"Semantic Annotation of High-Resolution Remote Sensing Images via Gaussian Process Multi-Instance Multilabel Learning","volume":"10","author":"Chen","year":"2013","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1080\/01431161.2012.705443","article-title":"Automatic landslide detection from remote-sensing imagery using a scene classification method based on BoVW and pLSA","volume":"34","author":"Cheng","year":"2013","journal-title":"Int. J. Remote Sens."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-based learning applied to document recognition","volume":"86","author":"Lecun","year":"1998","journal-title":"Proc. IEEE"},{"key":"ref_34","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015, January 7\u201312). Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"474","DOI":"10.1109\/LGRS.2018.2795531","article-title":"Fully Convolutional Networks for Semantic Segmentation of Very High Resolution Remotely Sensed Images Combined With DSM","volume":"15","author":"Sun","year":"2018","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_37","unstructured":"Jaderberg, M., Simonyan, K., Zisserman, A., and Kavukcuoglu, K. (2015, January 7\u201312). Spatial Transformer Networks. Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, Montreal, QC, Canada."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201322). Squeeze-and-Excitation Networks. Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Fu, J., Liu, J., Tian, H., Fang, Z., and Lu, H. (2019, January 16\u201320). Dual Attention Network for Scene Segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00326"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J., and Kweon, I.S. (2018, January 8\u201314). CBAM: Convolutional Block Attention Module. Proceedings of the Computer Vision\u2014ECCV 2018\u201415th European Conference, Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., and Tang, X. (2017, January 21\u201326). Residual Attention Network for Image Classification. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.683"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Lee, C., Fang, W., Yeh, C., and Wang, Y.F. (2018, January 18\u201322). Multi-Label Zero-Shot Learning With Structured Knowledge Graphs. Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00170"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Tan, Q., Liu, Y., Chen, X., and Yu, G. (2017). Multi-Label Classification Based on Low Rank Representation for Image Annotation. Remote Sens., 9.","DOI":"10.3390\/rs9020109"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"188","DOI":"10.1016\/j.isprsjprs.2019.01.015","article-title":"Recurrently exploring class-wise attention in a hybrid convolutional and bidirectional LSTM network for multi-label aerial image classification","volume":"149","author":"Hua","year":"2019","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_45","unstructured":"Gori, M., Monfardini, G., and Scarselli, F. (August, January 31). A new model for learning in graph domains. Proceedings of the 2005 IEEE International Joint Conference on Neural Networks, Montreal, QC, Canada."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1109\/TNN.2008.2005605","article-title":"The Graph Neural Network Model","volume":"20","author":"Scarselli","year":"2009","journal-title":"IEEE Trans. Neural Netw."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"498","DOI":"10.1109\/TNN.2008.2010350","article-title":"Neural Network for Graphs: A Contextual Constructive Approach","volume":"20","author":"Micheli","year":"2009","journal-title":"IEEE Trans. Neural Netw."},{"key":"ref_48","unstructured":"Lipton, Z.C. (2015). A Critical Review of Recurrent Neural Networks for Sequence Learning. arXiv."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Cho, K., van Merrienboer, B., G\u00fcl\u00e7ehre, \u00c7., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y. (2014, January 25\u201329). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, Doha, Qatar.","DOI":"10.3115\/v1\/D14-1179"},{"key":"ref_50","unstructured":"Defferrard, M., Bresson, X., and Vandergheynst, P. (2016, January 5\u201310). Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, Barcelona, Spain."},{"key":"ref_51","unstructured":"Kipf, T.N., and Welling, M. (2017, January 24\u201326). Semi-Supervised Classification with Graph Convolutional Networks. Proceedings of the 5th International Conference on Learning Representations, ICLR 2017, Toulon, France."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Yu, B., Yin, H., and Zhu, Z. (2018, January 13\u201319). Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, Stockholm, Sweden.","DOI":"10.24963\/ijcai.2018\/505"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Perozzi, B., Al-Rfou, R., and Skiena, S. (2014, January 24\u201327). DeepWalk: Online learning of social representations. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD \u201914, New York, NY, USA.","DOI":"10.1145\/2623330.2623732"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Grover, A., and Leskovec, J. (2016, January 13\u201317). node2vec: Scalable Feature Learning for Networks. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA.","DOI":"10.1145\/2939672.2939754"},{"key":"ref_55","unstructured":"Bordes, A., Usunier, N., Garc\u00eda-Dur\u00e1n, A., Weston, J., and Yakhnenko, O. (2013, January 5\u20138). Translating Embeddings for Modeling Multi-relational Data. Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013, Lake Tahoe, NV, USA."},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"423","DOI":"10.1109\/TPAMI.2018.2798607","article-title":"Multimodal Machine Learning: A Survey and Taxonomy","volume":"41","author":"Baltrusaitis","year":"2019","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"317","DOI":"10.1080\/10350339909360442","article-title":"Interdependence, interaction and metaphor in multisemiotic texts","volume":"9","year":"1999","journal-title":"Soc. Semiot."},{"key":"ref_58","unstructured":"Devlin, J., Chang, M., Lee, K., and Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv."},{"key":"ref_59","unstructured":"Nair, V., and Hinton, G.E. (2010, January 21\u201324). Rectified Linear Units Improve Restricted Boltzmann Machines. Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel."},{"key":"ref_60","first-page":"2579","article-title":"Visualizing High-Dimensional Data Using t-SNE","volume":"9","author":"Hinton","year":"2008","journal-title":"J. Mach. Learn. Res."},{"key":"ref_61","unstructured":"Frome, A., Corrado, G.S., Shlens, J., Bengio, S., Dean, J., Ranzato, M., and Mikolov, T. (2013, January 5\u20138). DeViSE: A Deep Visual-Semantic Embedding Model. Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013, Lake Tahoe, NV, USA."},{"key":"ref_62","unstructured":"Norouzi, M., Mikolov, T., Bengio, S., Singer, Y., Shlens, J., Frome, A., Corrado, G., and Dean, J. (2014, January 14\u201316). Zero-Shot Learning by Convex Combination of Semantic Embeddings. Proceedings of the 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada."},{"key":"ref_63","doi-asserted-by":"crossref","first-page":"694","DOI":"10.1109\/LGRS.2017.2671922","article-title":"A Deep Learning Approach to UAV Image Multilabeling","volume":"14","author":"Zeggada","year":"2017","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_64","unstructured":"Gong, Y., Jia, Y., Leung, T., Toshev, A., and Ioffe, S. (2014, January 14\u201316). Deep Convolutional Ranking for Multilabel Image Annotation. Proceedings of the 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada."},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"Li, Y., Song, Y., and Luo, J. (2017, January 21\u201326). Improving Pairwise Ranking for Multi-label Image Classification. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.199"},{"key":"ref_66","doi-asserted-by":"crossref","unstructured":"Sumbul, G., Charfuelan, M., Demir, B., and Markl, V. (August, January 28). BigEarthNet: A Large-Scale Benchmark Archive For Remote Sensing Image Understanding. Proceedings of the 2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan.","DOI":"10.1109\/IGARSS.2019.8900532"},{"key":"ref_67","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014, January 6\u201312). Microsoft coco: Common objects in context. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10602-1_48"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/10\/2424\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:14:21Z","timestamp":1760138061000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/10\/2424"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,5,18]]},"references-count":67,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2022,5]]}},"alternative-id":["rs14102424"],"URL":"https:\/\/doi.org\/10.3390\/rs14102424","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,5,18]]}}}