{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,20]],"date-time":"2026-03-20T23:25:48Z","timestamp":1774049148526,"version":"3.50.1"},"reference-count":81,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2023,4,22]],"date-time":"2023-04-22T00:00:00Z","timestamp":1682121600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["U1964203"],"award-info":[{"award-number":["U1964203"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["2022YFB2503004"],"award-info":[{"award-number":["2022YFB2503004"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["U1964203"],"award-info":[{"award-number":["U1964203"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2022YFB2503004"],"award-info":[{"award-number":["2022YFB2503004"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Geolocation is a fundamental component of route planning and navigation for unmanned vehicles, but GNSS-based geolocation fails under denial-of-service conditions. Cross-view geo-localization (CVGL), which aims to estimate the geographic location of the ground-level camera by matching against enormous geo-tagged aerial (e.g., satellite) images, has received a lot of attention but remains extremely challenging due to the drastic appearance differences across aerial\u2013ground views. In existing methods, global representations of different views are extracted primarily using Siamese-like architectures, but their interactive benefits are seldom taken into account. In this paper, we present a novel approach using cross-view knowledge generative techniques in combination with transformers, namely mutual generative transformer learning (MGTL), for CVGL. Specifically, by taking the initial representations produced by the backbone network, MGTL develops two separate generative sub-modules\u2014one for aerial-aware knowledge generation from ground-view semantics and vice versa\u2014and fully exploits the entirely mutual benefits through the attention mechanism. Moreover, to better capture the co-visual relationships between aerial and ground views, we introduce a cascaded attention masking algorithm to further boost accuracy. Extensive experiments on challenging public benchmarks, i.e., CVACT and CVUSA, demonstrate the effectiveness of the proposed method, which sets new records compared with the existing state-of-the-art models. Our code will be available upon acceptance.<\/jats:p>","DOI":"10.3390\/rs15092221","type":"journal-article","created":{"date-parts":[[2023,4,24]],"date-time":"2023-04-24T02:06:11Z","timestamp":1682301971000},"page":"2221","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":18,"title":["Co-Visual Pattern-Augmented Generative Transformer Learning for Automobile Geo-Localization"],"prefix":"10.3390","volume":"15","author":[{"given":"Jianwei","family":"Zhao","sequence":"first","affiliation":[{"name":"School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China"},{"name":"Center for Robotics, University of Electronic Science and Technology of China, Chengdu 611731, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5328-675X","authenticated-orcid":false,"given":"Qiang","family":"Zhai","sequence":"additional","affiliation":[{"name":"School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China"},{"name":"Center for Robotics, University of Electronic Science and Technology of China, Chengdu 611731, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2353-3251","authenticated-orcid":false,"given":"Pengbo","family":"Zhao","sequence":"additional","affiliation":[{"name":"McCormick School of Engineering, Northwestern University, Evanston, IL 60611, USA"}]},{"given":"Rui","family":"Huang","sequence":"additional","affiliation":[{"name":"School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China"},{"name":"Center for Robotics, University of Electronic Science and Technology of China, Chengdu 611731, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5532-9530","authenticated-orcid":false,"given":"Hong","family":"Cheng","sequence":"additional","affiliation":[{"name":"School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China"},{"name":"Center for Robotics, University of Electronic Science and Technology of China, Chengdu 611731, China"}]}],"member":"1968","published-online":{"date-parts":[[2023,4,22]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"213","DOI":"10.1007\/s11263-015-0830-0","article-title":"Image based geo-localization in the alps","volume":"116","author":"Saurer","year":"2016","journal-title":"Int. J. Comput. Vis."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Senlet, T., and Elgammal, A. (2012, January 14\u201319). Satellite image-based precise robot localization on sidewalks. Proceedings of the IEEE International Conference on Robotics and Automation, St Paul, MN, USA.","DOI":"10.1109\/ICRA.2012.6225352"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"537","DOI":"10.1109\/TITS.2020.3013234","article-title":"Multimodal end-to-end autonomous driving","volume":"23","author":"Xiao","year":"2020","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_4","unstructured":"Wang, S., Zhang, Y., and Li, H. (2022). Satellite image based cross-view localization for autonomous vehicle. arXiv."},{"key":"ref_5","unstructured":"Thoma, J., Paudel, D.P., Chhatkuli, A., Probst, T., and Gool, L.V. (November, January 27). Mapping, localization and path planning for image-based navigation using visual features and map. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Roy, N., and Debarshi, S. (2020, January 27\u201328). Uav-based person re-identification and dynamic image routing using wireless mesh networking. Proceedings of the 2020 7th International Conference on Signal Processing and Integrated Networks (SPIN) IEEE, Noida, India.","DOI":"10.1109\/SPIN48934.2020.9071078"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1205","DOI":"10.1007\/s11263-019-01186-0","article-title":"Image-based geo-localization using satellite imagery","volume":"128","author":"Hu","year":"2020","journal-title":"IJCV"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., and Sivic, J. (2016, January 27\u201330). NetVLAD: CNN architecture for weakly supervised place recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.572"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Workman, S., and Jacobs, N. (2015, January 8\u201310). On the location dependence of convolutional neural network features. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops, Boston, MA, USA.","DOI":"10.1109\/CVPRW.2015.7301385"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Vo, N.N., and Hays, J. (2016, January 8\u201316). Localizing and orienting street views using overhead imagery. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_30"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Hu, S., Feng, M., Nguyen, R.M., and Lee, G.H. (2018, January 18\u201322). Cvm-net: Cross-view matching network for image-based ground-to-aerial geo-localization. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00758"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Regmi, K., and Shah, M. (2019, January 16\u201320). Bridging the domain gap for ground-to-aerial image matching. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/ICCV.2019.00056"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Zhu, S., Shah, M., and Chen, C. (2022, January 19\u201323). TransGeo: Transformer Is all You Need for Cross-view Image Geo-localization. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00123"},{"key":"ref_14","first-page":"29009","article-title":"Cross-view Geo-localization with Layer-to-Layer Transformer","volume":"34","author":"Yang","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_15","unstructured":"Chen, Z., Lam, O., Jacobson, A., and Milford, M. (2014). Convolutional neural network-based place recognition. arXiv."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Xin, Z., Cai, Y., Lu, T., Xing, X., Cai, S., Zhang, J., Yang, Y., and Wang, Y. (2019, January 20\u201324). Localizing Discriminative Visual Landmarks for Place Recognition. Proceedings of the IEEE International Conference on Robotics and Automation, Montreal, QC, Canada.","DOI":"10.1109\/ICRA.2019.8794383"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"3882","DOI":"10.1109\/LRA.2022.3147257","article-title":"MultiRes-NetVLAD: Augmenting Place Recognition Training with Low-Resolution Imagery","volume":"7","author":"Khaliq","year":"2022","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"661","DOI":"10.1109\/TNNLS.2019.2908982","article-title":"Spatial pyramid-enhanced NetVLAD with weighted triplet loss for place recognition","volume":"31","author":"Yu","year":"2019","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Latif, Y., Garg, R., Milford, M., and Reid, I. (2018, January 21\u201326). Addressing challenging place recognition tasks using generative adversarial networks. Proceedings of the International Conference on Robotics and Automation, Brisbane, Australia.","DOI":"10.1109\/ICRA.2018.8461081"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Castaldo, F., Zamir, A., Angst, R., Palmieri, F., and Savarese, S. (2015, January 7\u201313). Semantic cross-view matching. Proceedings of the IEEE\/CVF International Conference on Computer Vision Workshops, Santiago, Chile.","DOI":"10.1109\/ICCVW.2015.137"},{"key":"ref_21","unstructured":"Mousavian, A., and Kosecka, J. (2016). Semantic Image Based Geolocation Given a Map. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Zhu, S., Yang, T., and Chen, C. (2021, January 19\u201325). Vigor: Cross-view image geo-localization beyond one-to-one retrieval. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Online.","DOI":"10.1109\/CVPR46437.2021.00364"},{"key":"ref_23","unstructured":"Shi, Y., Liu, L., Yu, X., and Li, H. (2019, January 8\u201314). Spatial-aware feature aggregation for image based cross-view geo-localization. Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Shi, Y., Yu, X., Liu, L., Zhang, T., and Li, H. (2020, January 7\u201312). Optimal feature transport for cross-view image geo-localization. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i07.6875"},{"key":"ref_25","unstructured":"Wang, T., Fan, S., Liu, D., and Sun, C. (2022). Transformer-Guided Convolutional Neural Network for Cross-View Geolocalization. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"867","DOI":"10.1109\/TCSVT.2021.3061265","article-title":"Each part matters: Local patterns facilitate cross-view geo-localization","volume":"32","author":"Wang","year":"2021","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_27","unstructured":"Wang, T., Zheng, Z., Zhu, Z., Gao, Y., Yang, Y., and Yan, C. (2022). Learning Cross-view Geo-localization Embeddings via Dynamic Weighted Decorrelation Regularization. arXiv."},{"key":"ref_28","unstructured":"Zhu, Y., Yang, H., Lu, Y., and Huang, Q. (2023). Simple, Effective and General: A New Backbone for Cross-view Image Geo-localization. arXiv."},{"key":"ref_29","unstructured":"Zhang, X., Li, X., Sultani, W., Zhou, Y., and Wshah, S. (2022). Cross-view Geo-localization via Learning Disentangled Geometric Layout Correspondence. arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Workman, S., Souvenir, R., and Jacobs, N. (2015, January 8\u201310). Wide-area image geolocalization with aerial reference imagery. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/ICCV.2015.451"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Liu, L., and Li, H. (2019, January 16\u201320). Lending orientation to neural networks for cross-view geo-localization. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00577"},{"key":"ref_32","first-page":"1","article-title":"Geographic Semantic Network for Cross-View Image Geo-Localization","volume":"60","author":"Zhu","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Zhu, B., Yang, C., Dai, J., Fan, J., and Ye, Y. (2023). R2FD2: Fast and Robust Matching of Multimodal Remote Sensing Image via Repeatable Feature Detector and Rotation-invariant Feature Descriptor. IEEE Trans. Geosci. Remote Sens.","DOI":"10.1109\/TGRS.2023.3264610"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Regmi, K., and Borji, A. (2018, January 18\u201322). Cross-view image synthesis using conditional gans. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00369"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Lu, X., Li, Z., Cui, Z., Oswald, M.R., Pollefeys, M., and Qin, R. (2020, January 14\u201319). Geometry-aware satellite-to-ground image synthesis for urban areas. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00094"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Ding, H., Wu, S., Tang, H., Wu, F., Gao, G., and Jing, X.Y. (2020, January 16\u201318). Cross-view image synthesis with deformable convolution and attention mechanism. Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision (PRCV), Nanjing, China.","DOI":"10.1007\/978-3-030-60633-6_32"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Cui, Y., Belongie, S., and Hays, J. (2015, January 8\u201310). Learning deep representations for ground-to-aerial geolocalization. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7299135"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Sun, B., Chen, C., Zhu, Y., and Jiang, J. (2019). GeoCapsNet: Aerial to Ground view Image Geo-localization using Capsule Network. arXiv.","DOI":"10.1109\/ICME.2019.00133"},{"key":"ref_39","unstructured":"Cai, S., Guo, Y., Khan, S., Hu, J., and Wen, G. (November, January 27). Ground-to-Aerial Image Geo-Localization With a Hard Exemplar Reweighting Triplet Loss. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_40","unstructured":"Ren, B., Tang, H., and Sebe, N. (2021). Cascaded cross mlp-mixer gans for cross-view image translation. arXiv."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Toker, A., Zhou, Q., Maximov, M., and Leal-Taix\u00e9, L. (2021, January 11\u201317). Coming down to earth: Satellite-to-street view synthesis for geo-localization. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Online.","DOI":"10.1109\/CVPR46437.2021.00642"},{"key":"ref_42","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017, January 4\u20139). Attention Is All You Need. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_43","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Chen, H., Wang, Y., Guo, T., Xu, C., Deng, Y., Liu, Z., Ma, S., Xu, C., Xu, C., and Gao, W. (2021, January 19\u201325). Pre-trained image processing transformer. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Online.","DOI":"10.1109\/CVPR46437.2021.01212"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Bhojanapalli, S., Chakrabarti, A., Glasner, D., Li, D., Unterthiner, T., and Veit, A. (2021, January 11\u201317). Understanding robustness of transformers for image classification. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Online.","DOI":"10.1109\/ICCV48922.2021.01007"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Lanchantin, J., Wang, T., Ordonez, V., and Qi, Y. (2021, January 19\u201325). General multi-label image classification with transformers. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Online.","DOI":"10.1109\/CVPR46437.2021.01621"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Strudel, R., Pinel, R.G., Laptev, I., and Schmid, C. (2021, January 11\u201317). Segmenter: Transformer for Semantic Segmentation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Online.","DOI":"10.1109\/ICCV48922.2021.00717"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1016\/j.patrec.2021.04.024","article-title":"Trseg: Transformer for semantic segmentation","volume":"148","author":"Jin","year":"2021","journal-title":"Pattern Recognit. Lett."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang, Y., Fu, Y., Feng, J., Xiang, T., and Torr, P.H. (2021, January 20\u201325). Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00681"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. (2020, January 23\u201328). End-to-end object detection with transformers. Proceedings of the European Conference on Computer Vision, Online.","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Misra, I., Girdhar, R., and Joulin, A. (2021, January 11\u201317). An end-to-end transformer model for 3d object detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Online.","DOI":"10.1109\/ICCV48922.2021.00290"},{"key":"ref_52","unstructured":"Zhu, X., Su, W., Lu, L., Li, B., Wang, X., and Dai, J. (2020). Deformable detr: Deformable transformers for end-to-end object detection. arXiv."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"563","DOI":"10.1109\/LSP.2022.3146798","article-title":"Light field image super-resolution with transformers","volume":"29","author":"Liang","year":"2022","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S., and Yang, M.H. (2022, January 19\u201323). Restormer: Efficient transformer for high-resolution image restoration. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00564"},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Li, Z., Liu, X., Drenkow, N., Ding, A., Creighton, F.X., Taylor, R.H., and Unberath, M. (2021, January 11\u201317). Revisiting stereo depth estimation from a sequence-to-sequence perspective with transformers. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Online.","DOI":"10.1109\/ICCV48922.2021.00614"},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Ding, Y., Yuan, W., Zhu, Q., Zhang, H., Liu, X., Wang, Y., and Liu, X. (2022, January 19\u201323). Transmvsnet: Global context-aware multi-view stereo network with transformers. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00839"},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"He, X., Chen, Y., and Lin, Z. (2021). Spatial-spectral transformer for hyperspectral image classification. Remote Sens., 13.","DOI":"10.3390\/rs13030498"},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Qing, Y., Liu, W., Feng, L., and Gao, W. (2021). Improved transformer net for hyperspectral image classification. Remote Sens., 13.","DOI":"10.3390\/rs13112216"},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TGRS.2022.3231215","article-title":"Spectral-spatial feature tokenization transformer for hyperspectral image classification","volume":"60","author":"Sun","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_60","first-page":"1","article-title":"Multispectral fusion transformer network for RGB-thermal urban scene semantic segmentation","volume":"19","author":"Zhou","year":"2022","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_61","unstructured":"Chu, X., Tian, Z., Zhang, B., Wang, X., Wei, X., Xia, H., and Shen, C. (2021). Conditional positional encodings for vision transformers. arXiv."},{"key":"ref_62","unstructured":"Li, Y., Zhang, K., Cao, J., Timofte, R., and Van Gool, L. (2021). Localvit: Bringing locality to vision transformers. arXiv."},{"key":"ref_63","doi-asserted-by":"crossref","unstructured":"Chen, C.F.R., Fan, Q., and Panda, R. (2021, January 11\u201317). Crossvit: Cross-attention multi-scale vision transformer for image classification. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Online.","DOI":"10.1109\/ICCV48922.2021.00041"},{"key":"ref_64","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 11\u201317). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Online.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"Yang, F., Zhai, Q., Li, X., Huang, R., Luo, A., Cheng, H., and Fan, D.P. (2021, January 11\u201317). Uncertainty-guided transformer reasoning for camouflaged object detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Online.","DOI":"10.1109\/ICCV48922.2021.00411"},{"key":"ref_66","unstructured":"Wang, W., Yao, L., Chen, L., Cai, D., He, X., and Liu, W. (2021). CrossFormer: A Versatile Vision Transformer Based on Cross-scale Attention. arXiv."},{"key":"ref_67","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_68","doi-asserted-by":"crossref","first-page":"6024","DOI":"10.1109\/TPAMI.2021.3085766","article-title":"Concealed object detection","volume":"44","author":"Fan","year":"2021","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_69","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.Y., and Kweon, I.S. (2018, January 8\u201314). Cbam: Convolutional block attention module. Proceedings of the IEEE European Conference on Computer Vision, Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_70","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-Net: Convolutional Networks for Biomedical Image Segmentation. Proceedings of the Medical Image Computing and Computer-Assisted Intervention, Munich, Germany.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_71","doi-asserted-by":"crossref","unstructured":"Zhai, M., Bessinger, Z., Workman, S., and Jacobs, N. (2017, January 21\u201326). Predicting ground-level scene layout from aerial imagery. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.440"},{"key":"ref_72","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and Fei-Fei, L. (2009, January 20\u201325). Imagenet: A large-scale hierarchical image database. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"ref_73","doi-asserted-by":"crossref","unstructured":"Shi, Y., Yu, X., Campbell, D., and Li, H. (2020, January 13\u201319). Where am I looking At? Joint location and orientation estimation by cross-view matching. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00412"},{"key":"ref_74","doi-asserted-by":"crossref","first-page":"3780","DOI":"10.1109\/TIP.2022.3175601","article-title":"Joint Representation Learning and Keypoint Detection for Cross-View Geo-Localization","volume":"31","author":"Lin","year":"2022","journal-title":"IEEE Trans. Image Process."},{"key":"ref_75","unstructured":"Kingma, D.P., and Welling, M. (2013). Auto-encoding variational bayes. arXiv."},{"key":"ref_76","unstructured":"Jie, H., Li, S., and Gang, S. (2018, January 18\u201323). Squeeze-and-Excitation Networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA."},{"key":"ref_77","doi-asserted-by":"crossref","unstructured":"Liu, J.J., Hou, Q., Cheng, M.M., Wang, C., and Feng, J. (2020, January 14\u201319). Improving Convolutional Networks With Self-Calibrated Convolutions. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01011"},{"key":"ref_78","doi-asserted-by":"crossref","unstructured":"Wang, X., Girshick, R., Gupta, A., and He, K. (2018, January 18\u201322). Non-local neural networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00813"},{"key":"ref_79","doi-asserted-by":"crossref","unstructured":"Li, X., Wang, W., Hu, X., and Yang, J. (2019, January 16\u201320). Selective Kernel Networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00060"},{"key":"ref_80","doi-asserted-by":"crossref","unstructured":"Liu, S., Huang, D., and Wang, Y. (2018, January 8\u201314). Receptive field block net for accurate and fast object detection. Proceedings of the European Conference on Computer Vision, Munich, Germany.","DOI":"10.1007\/978-3-030-01252-6_24"},{"key":"ref_81","doi-asserted-by":"crossref","unstructured":"Zheng, Z., Wei, Y., and Yang, Y. (2020, January 12\u201316). University-1652: A multi-view multi-source benchmark for drone-based geo-localization. Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA.","DOI":"10.1145\/3394171.3413896"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/9\/2221\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T19:21:13Z","timestamp":1760124073000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/9\/2221"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4,22]]},"references-count":81,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2023,5]]}},"alternative-id":["rs15092221"],"URL":"https:\/\/doi.org\/10.3390\/rs15092221","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,4,22]]}}}