{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,13]],"date-time":"2026-02-13T23:25:09Z","timestamp":1771025109554,"version":"3.50.1"},"reference-count":38,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2023,1,21]],"date-time":"2023-01-21T00:00:00Z","timestamp":1674259200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Key Research and Development Program of China","award":["2018AAA0100602"],"award-info":[{"award-number":["2018AAA0100602"]}]},{"name":"National Key Research and Development Program of China","award":["41927805"],"award-info":[{"award-number":["41927805"]}]},{"name":"National Natural Science Foundation of China","award":["2018AAA0100602"],"award-info":[{"award-number":["2018AAA0100602"]}]},{"name":"National Natural Science Foundation of China","award":["41927805"],"award-info":[{"award-number":["41927805"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>In recent years, there have been many multimodal works in the field of remote sensing, and most of them have achieved good results in the task of land-cover classification. However, multi-scale information is seldom considered in the multi-modal fusion process. Secondly, the multimodal fusion task rarely considers the application of attention mechanism, resulting in a weak representation of the fused feature. In order to better use the multimodal data and reduce the losses caused by the fusion of different modalities, we proposed a TRMSF (Transformer and Multi-scale fusion) network for land-cover classification based on HSI (hyperspectral images) and LiDAR (Light Detection and Ranging) images joint classification. The network enhances multimodal information fusion ability by the method of attention mechanism from Transformer and enhancement using multi-scale information to fuse features from different modal structures. The network consists of three parts: multi-scale attention enhancement module (MSAE), multimodality fusion module (MMF) and multi-output module (MOM). MSAE enhances the ability of feature representation from extracting different multi-scale features of HSI, which are used to fuse with LiDAR feature, respectively. MMF integrates the data of different modalities through attention mechanism, thereby reducing the loss caused by the data fusion of different modal structures. MOM optimizes the network by controlling different outputs and enhances the stability of the results. The experimental results show that the proposed network is effective in multimodality joint classification.<\/jats:p>","DOI":"10.3390\/rs15030650","type":"journal-article","created":{"date-parts":[[2023,1,23]],"date-time":"2023-01-23T04:19:22Z","timestamp":1674447562000},"page":"650","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":20,"title":["Attention Fusion of Transformer-Based and Scale-Based Method for Hyperspectral and LiDAR Joint Classification"],"prefix":"10.3390","volume":"15","author":[{"given":"Maqun","family":"Zhang","sequence":"first","affiliation":[{"name":"College of Information Science and Engineering, Ocean University of China, Qingdao 266100, China"},{"name":"Institute of Marine Development, Ocean University of China, Qingdao 266100, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1825-328X","authenticated-orcid":false,"given":"Feng","family":"Gao","sequence":"additional","affiliation":[{"name":"College of Information Science and Engineering, Ocean University of China, Qingdao 266100, China"},{"name":"Institute of Marine Development, Ocean University of China, Qingdao 266100, China"}]},{"given":"Tiange","family":"Zhang","sequence":"additional","affiliation":[{"name":"College of Information Science and Engineering, Ocean University of China, Qingdao 266100, China"},{"name":"Institute of Marine Development, Ocean University of China, Qingdao 266100, China"}]},{"given":"Yanhai","family":"Gan","sequence":"additional","affiliation":[{"name":"College of Information Science and Engineering, Ocean University of China, Qingdao 266100, China"},{"name":"Institute of Marine Development, Ocean University of China, Qingdao 266100, China"}]},{"given":"Junyu","family":"Dong","sequence":"additional","affiliation":[{"name":"College of Information Science and Engineering, Ocean University of China, Qingdao 266100, China"},{"name":"Institute of Marine Development, Ocean University of China, Qingdao 266100, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7655-9228","authenticated-orcid":false,"given":"Hui","family":"Yu","sequence":"additional","affiliation":[{"name":"Faculty of Creative & Cultural Industries, University of Portsmouth, Portsmouth PO1 2DJ, UK"}]}],"member":"1968","published-online":{"date-parts":[[2023,1,21]]},"reference":[{"key":"ref_1","first-page":"1","article-title":"A 3 CLNN: Spatial spectral and multiscale attention ConvLSTM neural network for multisource remote sensing data classification","volume":"33","author":"Li","year":"2020","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"4939","DOI":"10.1109\/TGRS.2020.2969024","article-title":"Classification of hyperspectral and LiDAR data using coupled CNNs","volume":"58","author":"Hang","year":"2020","journal-title":"IEEE Trans. Geosci. Remote. Sens."},{"key":"ref_3","first-page":"1","article-title":"Information fusion for classification of hyperspectral and LiDAR data using IP-CNN","volume":"60","author":"Zhang","year":"2021","journal-title":"IEEE Trans. Geosci. Remote. Sens."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"100","DOI":"10.1109\/TCYB.2018.2864670","article-title":"Feature extraction for classification of hyperspectral and LiDAR data using patch-to-patch CNN","volume":"50","author":"Zhang","year":"2018","journal-title":"IEEE Trans. Cybern."},{"key":"ref_5","first-page":"1","article-title":"Unsupervised self-correlated learning smoothy enhanced locality preserving graph convolution embedding clustering for hyperspectral images","volume":"60","author":"Ding","year":"2022","journal-title":"IEEE Trans. Geosci. Remote. Sens."},{"key":"ref_6","first-page":"1","article-title":"Semi-supervised locality preserving dense graph neural network with ARMA filters and context-aware learning for hyperspectral image classification","volume":"60","author":"Ding","year":"2021","journal-title":"IEEE Trans. Geosci. Remote. Sens."},{"key":"ref_7","first-page":"1","article-title":"Self-supervised locality preserving low-pass graph convolutional embedding for large-scale hyperspectral image clustering","volume":"60","author":"Ding","year":"2022","journal-title":"IEEE Trans. Geosci. Remote. Sens."},{"key":"ref_8","first-page":"1","article-title":"Graph sample and aggregate-attention network for hyperspectral image classification","volume":"19","author":"Ding","year":"2021","journal-title":"IEEE Geosci. Remote. Sens. Lett."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"201","DOI":"10.1016\/j.ins.2022.04.006","article-title":"AF2GNN: Graph convolution with adaptive filters and aggregator fusion for hyperspectral image classification","volume":"602","author":"Ding","year":"2022","journal-title":"Inf. Sci."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Yao, D., Zhi-li, Z., Xiao-feng, Z., Wei, C., Fang, H., Yao-ming, C., and Cai, W.W. (Def. Technol., 2022). Deep hybrid: Multi-graph neural network collaboration for hyperspectral image classification, Def. Technol., in press.","DOI":"10.1016\/j.dt.2022.02.007"},{"key":"ref_11","first-page":"30","article-title":"Attention is all you need","volume":"2017","author":"Vaswani","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_12","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., and Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_13","unstructured":"Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B., Liu, Z., Lin, Y., and Cao, Y. (2021, January 10\u201317). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"3095","DOI":"10.1109\/TIP.2022.3162964","article-title":"Deep Hierarchical Vision Transformer for Hyperspectral and LiDAR Data Classification","volume":"31","author":"Xue","year":"2022","journal-title":"IEEE Trans. Image Process."},{"key":"ref_15","unstructured":"Li, L.H., Yatskar, M., Yin, D., Hsieh, C.J., and Chang, K.W. (2019). Visualbert: A simple and performant baseline for vision and language. arXiv."},{"key":"ref_16","first-page":"32","article-title":"Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks","volume":"2019","author":"Lu","year":"2019","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Zhao, X., Zhang, M., Tao, R., Li, W., Liao, W., Tian, L., and Philips, W. (2022). Fractional fourier image transformer for multimodal remote sensing data classification. IEEE Trans. Neural Netw. Learn. Syst., 1\u201313.","DOI":"10.1109\/TNNLS.2022.3189994"},{"key":"ref_18","unstructured":"Yuxuan, H., He, H., and Weng, L. (2022, January 17\u201322). Hyperspectral and LiDAR Data Land-Use Classification Using Parallel Transformers. Proceedings of the IGARSS 2022\u20132022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Li, G., Duan, N., Fang, Y., Gong, M., and Jiang, D. (2020, January 7\u201312). Unicoder-vl: A universal encoder for vision and language by cross-modal pre-training. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA. No. 07.","DOI":"10.1609\/aaai.v34i07.6795"},{"key":"ref_20","unstructured":"Su, W., Zhu, X., Cao, Y., Li, B., Lu, L., Wei, F., and Dai, J. (2019). Vl-bert: Pre-training of generic visual-linguistic representations. arXiv."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Chen, Y.C., Li, L., Yu, L., El Kholy, A., Ahmed, F., Gan, Z., and Liu, J. (2020, January 23\u201328). Uniter: Universal Image-Text Representation Learning. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58577-8_7"},{"key":"ref_22","unstructured":"Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., and Sutskever, I. (2021, January 18\u201324). Learning transferable visual models from natural language supervision. Proceedings of the International Conference on Machine Learning, Virtual. PMLR."},{"key":"ref_23","unstructured":"Huang, Z., Zeng, Z., Liu, B., Fu, D., and Fu, J. (2020). Pixel-bert: Aligning image pixels with text by deep multi-modal transformers. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Zhen, L., Hu, P., Wang, X., and Peng, D. (2019, January 15\u201320). Deep supervised cross-modal retrieval. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01064"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. (2020, January 23\u201328). End-to-end object detection with transformers. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Wang, H., Zhu, Y., Adam, H., Yuille, A., and Chen, L.C. (2021, January 20\u201325). Max-deeplab: End-to-end panoptic segmentation with mask transformers. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00542"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Ren, P., Li, C., Wang, G., Xiao, Y., Du, Q., Liang, X., and Chang, X. (2022, January 18\u201324). Beyond Fixation: Dynamic Window Visual Transformer. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01168"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"3599","DOI":"10.1109\/TGRS.2018.2886022","article-title":"A CNN with multiscale convolution and diversified metric for hyperspectral image classification","volume":"57","author":"Gong","year":"2019","journal-title":"IEEE Trans. Geosci. Remote. Sens."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"277","DOI":"10.1109\/LGRS.2019.2918719","article-title":"HybridSN: Exploring 3-D\u20132-D CNN feature hierarchy for hyperspectral image classification","volume":"17","author":"Roy","year":"2019","journal-title":"IEEE Geosci. Remote. Sens. Lett."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Hu, R., and Amanpreet, S. (2021, January 10\u201317). Unit: Multimodal multitask learning with a unified transformer. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00147"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"2354","DOI":"10.1109\/TIP.2018.2799324","article-title":"Hyperspectral image classification with Markov random fields and a convolutional neural network","volume":"27","author":"Cao","year":"2018","journal-title":"IEEE Trans. Image Process."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"844","DOI":"10.1109\/TGRS.2016.2616355","article-title":"Hyperspectral image classification using deep pixel-pair features","volume":"55","author":"Li","year":"2016","journal-title":"IEEE Trans. Geosci. Remote. Sens."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"4843","DOI":"10.1109\/TIP.2017.2725580","article-title":"Going deeper with contextual CNN for hyperspectral image classification","volume":"26","author":"Lee","year":"2017","journal-title":"IEEE Trans. Image Process."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Wu, H., and Prasad, S. (2017). Convolutional recurrent neural networks for hyperspectral data classification. Remote. Sens., 9.","DOI":"10.3390\/rs9030298"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"937","DOI":"10.1109\/TGRS.2017.2756851","article-title":"Multisource remote sensing data classification based on convolutional neural network","volume":"56","author":"Xu","year":"2017","journal-title":"IEEE Trans. Geosci. Remote. Sens."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"7355","DOI":"10.1109\/TGRS.2020.2982064","article-title":"Joint classification of hyperspectral and LiDAR data using hierarchical random walk and deep CNN architecture","volume":"58","author":"Zhao","year":"2020","journal-title":"IEEE Trans. Geosci. Remote. Sens."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/LGRS.2022.3214929","article-title":"Deep encoder-decoder networks for classification of hyperspectral and LiDAR data","volume":"19","author":"Hong","year":"2022","journal-title":"IEEE Geosci. Remote. Sens. Lett."},{"key":"ref_38","first-page":"1","article-title":"S2ENet: Spatial\u2013spectral cross-modal enhancement network for classification of hyperspectral and LiDAR data","volume":"19","author":"Fang","year":"2022","journal-title":"IEEE Geosci. Remote. Sens. Lett."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/3\/650\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:13:10Z","timestamp":1760119990000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/3\/650"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,21]]},"references-count":38,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2023,2]]}},"alternative-id":["rs15030650"],"URL":"https:\/\/doi.org\/10.3390\/rs15030650","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,1,21]]}}}