{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T14:26:11Z","timestamp":1775226371465,"version":"3.50.1"},"reference-count":30,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2022,4,17]],"date-time":"2022-04-17T00:00:00Z","timestamp":1650153600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Shenzhen Science and Technology Program","award":["KQTD20190929172704911"],"award-info":[{"award-number":["KQTD20190929172704911"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IJGI"],"abstract":"<jats:p>Existing optical remote sensing image change detection (CD) methods aim to learn an appropriate discriminate decision by analyzing the feature information of bitemporal images obtained at the same place. However, the complex scenes in high-resolution (HR) remote images cause unsatisfied results, especially for some irregular and occluded objects. Although recent self-attention-driven change detection models with CNN achieve promising effects, the computational and consumed parameters costs emerge as an impassable gap for HR images. In this paper, we utilize a transformer structure replacing self-attention to learn stronger feature representations per image. In addition, concurrent vision transformer models only consider tokenizing single-dimensional image tokens, thus failing to build multi-scale long-range interactions among features. Here, we propose a hybrid multi-scale transformer module for HR remote images change detection, which fully models representation attentions at hybrid scales of each image via a fine-grained self-attention mechanism. The key idea of the hybrid transformer structure is to establish heterogeneous semantic tokens containing multiple receptive fields, thus simultaneously preserving large object and fine-grained features. For building relationships between features without embedding with token sequences from the Siamese tokenizer, we also introduced a hybrid difference transformer decoder (HDTD) layer to further strengthen multi-scale global dependencies of high-level features. Compared to capturing single-stream tokens, our HDTD layer directly focuses representing differential features without increasing exponential computational cost. Finally, we propose a cascade feature decoder (CFD) for aggregating different-dimensional upsampling features by establishing difference skip-connections. To evaluate the effectiveness of the proposed method, experiments on two HR remote sensing CD datasets are conducted. Compared to state-of-the-art methods, our Hybrid-TransCD achieved superior performance on both datasets (i.e., LEVIR-CD, SYSU-CD) with improvements of 0.75% and 1.98%, respectively.<\/jats:p>","DOI":"10.3390\/ijgi11040263","type":"journal-article","created":{"date-parts":[[2022,4,18]],"date-time":"2022-04-18T04:21:28Z","timestamp":1650255688000},"page":"263","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":51,"title":["Hybrid-TransCD: A Hybrid Transformer Remote Sensing Image Change Detection Network via Token Aggregation"],"prefix":"10.3390","volume":"11","author":[{"given":"Qingtian","family":"Ke","sequence":"first","affiliation":[{"name":"School of Electronics and Communication Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5267-0936","authenticated-orcid":false,"given":"Peng","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Electronics and Communication Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,4,17]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Shi, W., Zhang, M., Zhang, R., Chen, S., and Zhan, Z. (2020). Change detection based on artificial intelligence: State-of-the-art and challenges. Remote Sens., 12.","DOI":"10.3390\/rs12101688"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"811","DOI":"10.1109\/LGRS.2020.2988032","article-title":"Building Change Detection for Remote Sensing Images Using a Dual-Task Constrained Deep Siamese Convolutional Network Model","volume":"18","author":"Liu","year":"2020","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Fang, B., Pan, L., and Kou, R. (2019). Dual learning-based siamese framework for change detection using bitemporal VHR optical remote sensing images. Remote Sens., 11.","DOI":"10.3390\/rs11111292"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"12279","DOI":"10.1109\/ACCESS.2020.2964798","article-title":"Change detection on multi-spectral images based on feature-level U-Net","volume":"8","author":"Wiratama","year":"2020","journal-title":"IEEE Access"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Wu, C., Zhang, F., Xia, J., Xu, Y., Li, G., Xie, J., Du, Z., and Liu, R. (2021). Building Damage Detection Using U-Net with Attention Mechanism from Pre-and Post-Disaster Remote Sensing Datasets. Remote Sens., 13.","DOI":"10.3390\/rs13050905"},{"key":"ref_6","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_8","unstructured":"Kolesnikov, A., Dosovitskiy, A., Weissenborn, D., Heigold, G., Uszkoreit, J., Beyer, L., Minderer, M., Dehghani, M., Houlsby, N., and Gelly, S. (2020). An image is worth 16 \u00d7 16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Zheng, Z., Ma, A., Zhang, L., and Zhong, Y. (2021, January 11\u201317). Change is Everywhere: Single-Temporal Supervised Object Change Detection in Remote Sensing Imagery. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.01491"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1109","DOI":"10.1109\/JSTARS.2020.2974276","article-title":"Deep depthwise separable convolutional network for change detection in optical aerial images","volume":"13","author":"Liu","year":"2020","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"9987","DOI":"10.1109\/JSTARS.2021.3113831","article-title":"CS-HSNet: A Cross-Siamese Change Detection Network Based on Hierarchical-Split Attention","volume":"14","author":"Ke","year":"2021","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-net: Convolutional networks for biomedical image segmentation. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Zhou, Z., Siddiquee, M.M.R., Tajbakhsh, N., and Liang, J. (2018). Unet++: A nested u-net architecture for medical image segmentation. Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Springer.","DOI":"10.1007\/978-3-030-00889-5_1"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"3520","DOI":"10.1109\/TIP.2019.2962685","article-title":"Semantic segmentation with context encoding and multi-path decoding","volume":"29","author":"Ding","year":"2020","journal-title":"IEEE Trans. Image Process."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1194","DOI":"10.1109\/JSTARS.2020.3037893","article-title":"DASNet: Dual attentive fully convolutional siamese networks for change detection of high resolution satellite images","volume":"14","author":"Chen","year":"2020","journal-title":"IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Chen, H., and Shi, Z. (2020). A spatial-temporal attention-based method and a new dataset for remote sensing image change detection. Remote Sens., 12.","DOI":"10.3390\/rs12101662"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Ke, Q., and Zhang, P. (2021). MCCRNet: A Multi-Level Change Contextual Refinement Network for Remote Sensing Image Change Detection. ISPRS Int. J. Geo.-Inf., 10.","DOI":"10.3390\/ijgi10090591"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Fu, L., Li, Y., and Zhang, Y. (2021). Hdfnet: Hierarchical dynamic fusion network for change detection in optical aerial images. Remote Sens., 13.","DOI":"10.3390\/rs13081440"},{"key":"ref_19","first-page":"1","article-title":"SNUNet-CD: A Densely Connected Siamese Network for Change Detection of VHR Images","volume":"19","author":"Fang","year":"2021","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1016\/j.isprsjprs.2020.06.003","article-title":"A deeply supervised image fusion network for change detection in high resolution bitemporal remote sensing images","volume":"166","author":"Zhang","year":"2020","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/LGRS.2022.3144304","article-title":"EUNet-CD: Efficient UNet++ for Change Detection of Very High-Resolution Remote Sensing Images","volume":"19","author":"Raza","year":"2022","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_22","unstructured":"Chen, H., Qi, Z., and Shi, Z. (2021). Efficient transformer based method for remote sensing image change detection. arXiv e-Prints."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"41409","DOI":"10.1364\/OE.440720","article-title":"TransCD: Scene change detection via transformer-based architecture","volume":"29","author":"Wang","year":"2021","journal-title":"Opt. Express"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Wang, W., Xie, E., Li, X., Fan, D.P., Song, K., Liang, D., Lu, T., Luo, P., and Shao, L. (2021). Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. arXiv.","DOI":"10.1109\/ICCV48922.2021.00061"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. arXiv.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_26","unstructured":"Wang, W., Yao, L., Chen, L., Lin, B., Cai, D., He, X., and Liu, W. (2021). CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention. arXiv."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Lin, H., Cheng, X., Wu, X., Yang, F., Shen, D., Wang, Z., Song, Q., and Yuan, W. (2021). CAT: Cross Attention in Vision Transformer. arXiv.","DOI":"10.1109\/ICME52920.2022.9859720"},{"key":"ref_28","first-page":"1","article-title":"A deeply supervised attention metric-based network and an open aerial image dataset for remote sensing change detection","volume":"60","author":"Shi","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_29","unstructured":"Kingma, D.P., and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and Fei-Fei, L. (2009, January 20\u201325). Imagenet: A large-scale hierarchical image database. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206848"}],"container-title":["ISPRS International Journal of Geo-Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2220-9964\/11\/4\/263\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:55:35Z","timestamp":1760136935000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2220-9964\/11\/4\/263"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4,17]]},"references-count":30,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2022,4]]}},"alternative-id":["ijgi11040263"],"URL":"https:\/\/doi.org\/10.3390\/ijgi11040263","relation":{},"ISSN":["2220-9964"],"issn-type":[{"value":"2220-9964","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,4,17]]}}}