{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,14]],"date-time":"2026-05-14T11:43:01Z","timestamp":1778758981587,"version":"3.51.4"},"reference-count":45,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2025,5,15]],"date-time":"2025-05-15T00:00:00Z","timestamp":1747267200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J. Imaging"],"abstract":"<jats:p>The categorization of remote sensing satellite imagery is crucial for various applications, including environmental monitoring, urban planning, and disaster management. Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have exhibited exceptional performance among deep learning techniques, excelling in feature extraction and representational learning. This paper presents a hybrid dual-stream ResV2ViT model that combines the advantages of ResNet50 V2 and Vision Transformer (ViT) architectures. The dual-stream approach allows the model to extract both local spatial features and global contextual information by processing data through two complementary pathways. The ResNet50V2 component is utilized for hierarchical feature extraction and captures short-range dependencies, whereas the ViT module efficiently models long-range dependencies and global contextual information. After position embedding in the hybrid model, the tokens are bifurcated into two parts: q1 and q2. q1 is passed into the convolutional block to refine local spatial details, and q2 is given to the Transformer to provide global attention to the spatial feature. Combining these two architectures allows the model to acquire low-level and high-level feature representations, improving classification performance. We assess the proposed ResV2ViT model using the RSI-CB256 dataset and another dataset with 21 classes. The proposed model attains an average accuracy of 99.91%, with precision and F1 score of 99.90% for the first dataset and 98.75% accuracy for the second dataset, illustrating its efficacy in satellite image classification. The findings demonstrate that the dual-stream hybrid ResV2ViT model surpasses traditional CNN and Transformer-based models, establishing it as a formidable framework for remote sensing applications.<\/jats:p>","DOI":"10.3390\/jimaging11050156","type":"journal-article","created":{"date-parts":[[2025,5,15]],"date-time":"2025-05-15T09:58:27Z","timestamp":1747303107000},"page":"156","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Unleashing the Potential of Residual and Dual-Stream Transformers for the Remote Sensing Image Analysis"],"prefix":"10.3390","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2171-0717","authenticated-orcid":false,"given":"Priya","family":"Mittal","sequence":"first","affiliation":[{"name":"Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura 140401, Punjab, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7215-0794","authenticated-orcid":false,"given":"Vishesh","family":"Tanwar","sequence":"additional","affiliation":[{"name":"Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura 140401, Punjab, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3400-3504","authenticated-orcid":false,"given":"Bhisham","family":"Sharma","sequence":"additional","affiliation":[{"name":"Centre of Research Impact and Outcome, Chitkara University, Rajpura 140401, Punjab, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dhirendra Prasad","family":"Yadav","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering & Applications, G.L.A. University, Mathura 281406, Uttar Pradesh, India"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,5,15]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Xie, G., and Niculescu, S. (2022). Mapping crop types using sentinel-2 data machine learning and monitoring crop phenology with sentinel-1 backscatter time series in pays de Brest, Brittany, France. Remote Sens., 14.","DOI":"10.3390\/rs14184437"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"111322","DOI":"10.1016\/j.rse.2019.111322","article-title":"Land-cover classification with high-resolution remote sensing images using transferable deep models","volume":"237","author":"Tong","year":"2020","journal-title":"Remote Sens. Environ."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"581","DOI":"10.1049\/cit2.12180","article-title":"Deep learning: Applications, architectures, models, tools, and frameworks: A comprehensive survey","volume":"8","author":"Gheisari","year":"2023","journal-title":"CAAI Trans. Intell. Technol."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"702","DOI":"10.1109\/JSTARS.2022.3226563","article-title":"Compression supports spatial deep learning","volume":"16","author":"Dax","year":"2022","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"5728","DOI":"10.1109\/JSEN.2017.2723599","article-title":"Fast motion object detection algorithm using complementary depth image on an RGB-D camera","volume":"17","author":"Sun","year":"2017","journal-title":"IEEE Sens. J."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"954","DOI":"10.1109\/JSTSP.2021.3058895","article-title":"RODNet: A real-time radar object detection network cross-supervised by camera-radar fused object 3D localization","volume":"15","author":"Wang","year":"2021","journal-title":"IEEE J. Sel. Top. Signal Process."},{"key":"ref_7","first-page":"5518612","article-title":"A Spectrum-Aware Transformer Network for Change Detection in Hyperspectral Imagery","volume":"61","author":"Zhang","year":"2023","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"103033","DOI":"10.1109\/ACCESS.2023.3307642","article-title":"DAHT-Net: Deformable Attention-Guided Hierarchical Transformer Network Based on Remote Sensing Image Change Detection","volume":"11","author":"Shi","year":"2023","journal-title":"IEEE Access"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1109\/JMASS.2023.3234076","article-title":"Deep Spatial Feature Transformation for Oriented Aerial Object Detection","volume":"4","author":"Gao","year":"2023","journal-title":"IEEE J. Miniaturization Air Space Syst."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"30883","DOI":"10.1109\/JSEN.2023.3330146","article-title":"A novel keypoint supplemented R-CNN for UAV object detection","volume":"23","author":"Butler","year":"2023","journal-title":"IEEE Sens. J."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1951","DOI":"10.1109\/JSTARS.2023.3241157","article-title":"TCIANet: Transformer-based context information aggregation network for remote sensing image change detection","volume":"16","author":"Xu","year":"2023","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"10266","DOI":"10.1109\/JSTARS.2024.3400458","article-title":"LFHNet: Lightweight Full-scale Hybrid Network for Remote Sensing Change Detection","volume":"17","author":"Jiang","year":"2024","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"3867","DOI":"10.1109\/JSTARS.2023.3264802","article-title":"HANet: A hierarchical attention network for change detection with bitemporal very-high-resolution remote sensing images","volume":"16","author":"Han","year":"2023","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"2133","DOI":"10.1109\/JSTARS.2023.3327340","article-title":"CLDRNet: A Difference Refinement Network based on Category Context Learning for Remote Sensing Image Change Detection","volume":"17","author":"Wan","year":"2023","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"3104","DOI":"10.1109\/JSTARS.2023.3260006","article-title":"MDFENet: A multiscale difference feature enhancement network for remote sensing change detection","volume":"16","author":"Li","year":"2023","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"2559","DOI":"10.1109\/JSTARS.2023.3251962","article-title":"Spectral token guidance transformer for multisource images change detection","volume":"16","author":"Sun","year":"2023","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"11080","DOI":"10.1109\/JSEN.2024.3357775","article-title":"CR-DINO: A Novel Camera-Radar Fusion 2D Object Detection Model Based On Transformer","volume":"24","author":"Jin","year":"2024","journal-title":"IEEE Sens. J."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"149308","DOI":"10.1109\/ACCESS.2024.3465027","article-title":"Application of Remote Sensing Image Change Detection Algorithm in Extracting Damaged Buildings in Earthquake Disaster","volume":"12","author":"Jia","year":"2024","journal-title":"IEEE Access"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"15407","DOI":"10.1109\/JSTARS.2024.3449923","article-title":"Transformer with feature interaction and fusion for remote sensing image change detection","volume":"17","author":"Guo","year":"2024","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"8888","DOI":"10.1109\/JSTARS.2024.3392917","article-title":"BD-MSA: Body decouple VHR Remote Sensing Image Change Detection method guided by multi-scale feature information aggregation","volume":"17","author":"Tan","year":"2024","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"3366","DOI":"10.1109\/JSTARS.2024.3350044","article-title":"Mask guided local-global attentive network for change detection in remote sensing images","volume":"17","author":"Xiong","year":"2024","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Jayasree, J., Madhavi, A.V., and Geetha, G. (2023, January 5\u20136). Multi-Label Classification On Aerial Images Using Deep Learning Techniques. Proceedings of the 2023 International Conference on Networking and Communications (ICNWC), Chennai, India.","DOI":"10.1109\/ICNWC57852.2023.10127406"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Li, H., Dou, X., Tao, C., Wu, Z., Chen, J., Peng, J., Deng, M., and Zhao, L. (2020). RSI-CB: A large-scale remote sensing image classification benchmark using crowdsourced data. Sensors, 20.","DOI":"10.3390\/s20061594"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1451","DOI":"10.1109\/LGRS.2018.2839092","article-title":"Enhanced fusion of deep neural networks for classification of benchmark high-resolution image data sets","volume":"15","author":"Scott","year":"2018","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Yogesh, T., and Devi, S.V.S. (2024, January 17\u201318). Enhancing Remote Sensing Image Classification: A Strategic Integration of Deep Learning Technique and Transfer Learning Approach. Proceedings of the 2024 Second International Conference on Data Science and Information System (ICDSIS), Hassan, India.","DOI":"10.1109\/ICDSIS61070.2024.10594062"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Kaur, A., Gill, K.S., Chattopadhyay, S., and Singh, M. (2024, January 12\u201314). Next-Gen Land Cover Classification by Unleashing Transfer Learning in Satellite Imagery. Proceedings of the 2024 2nd World Conference on Communication & Computing (WCONF), Raipur, India.","DOI":"10.1109\/WCONF61366.2024.10692171"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"5676","DOI":"10.1109\/TAI.2024.3423813","article-title":"Lightweight Parallel Convolutional Neural Networkwith SVM classifier for Satellite Imagery Classification","volume":"5","author":"Tumpa","year":"2024","journal-title":"IEEE Trans. Artif. Intell."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Ulla, S., Shipra, E.H., Tahmeed, M.A., Saha, P., Palash, M.I.A., and Hossam-E-Haider, M. (2023, January 21\u201323). SatNet: A Lightweight Satellite Image Classification Model Using Deep Convolutional Neural Network. Proceedings of the 2023 IEEE International Conference on Telecommunications and Photonics (ICTP), Dhaka, Bangladesh.","DOI":"10.1109\/ICTP60248.2023.10490785"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Tehsin, S., Kausar, S., Jameel, A., Humayun, M., and Almofarreh, D.K. (2023). Satellite image categorization using scalable deep learning. Appl. Sci., 13.","DOI":"10.3390\/app13085108"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Sharma, I., and Gupta, S. (2023, January 6\u20138). A Hybrid Machine Learning and Deep Learning Approach for Remote Sensing Scene Classification. Proceedings of the 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), Delhi, India.","DOI":"10.1109\/ICCCNT56998.2023.10307173"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Liu, N., Mou, H., Tang, J., Wan, L., Li, Q., and Yuan, Y. (2022). Fully Connected Hashing Neural Networks for Indexing Large-Scale Remote Sensing Images. Mathematics, 10.","DOI":"10.3390\/math10244716"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"2403","DOI":"10.1109\/LGRS.2015.2478966","article-title":"Land-use scene classification in high-resolution remote sensing images using improved correlatons","volume":"12","author":"Qi","year":"2015","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"4620","DOI":"10.1109\/JSTARS.2014.2339842","article-title":"Land-use scene classification using a concentric circle-structured multiscale bag-of-visual-words model","volume":"7","author":"Zhao","year":"2014","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"1895","DOI":"10.1109\/LGRS.2016.2616440","article-title":"Deep filter banks for land-use scene classification","volume":"13","author":"Wu","year":"2016","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"1687","DOI":"10.1109\/LGRS.2019.2952660","article-title":"Further exploring convolutional neural networks\u2019 potential for land-use scene classification","volume":"17","author":"Li","year":"2019","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Hu, F., Xia, G.S., and Zhang, L. (2016, January 6\u201310). Deep sparse representations for land-use scene classification in remote sensing images. Proceedings of the 2016 IEEE 13th International Conference on Signal Processing (ICSP), Chengdu, China.","DOI":"10.1109\/ICSP.2016.7877822"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Yao, Y., Liang, H., Li, X., Zhang, J., and He, J. (2017). Sensing urban land-use patterns by integrating Google Tensorflow and scene-classification models. arXiv.","DOI":"10.5194\/isprs-archives-XLII-2-W7-981-2017"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"2250","DOI":"10.1109\/TGRS.2016.2640186","article-title":"Unsupervised feature learning for land-use scene recognition","volume":"55","author":"Fan","year":"2017","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"1947","DOI":"10.1109\/TGRS.2014.2351395","article-title":"Pyramid of spatial relatons for scene-level land use classification","volume":"53","author":"Chen","year":"2014","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"2015","DOI":"10.1109\/JSTARS.2015.2444405","article-title":"Unsupervised feature learning via spectral clustering of multidimensional patches for remotely sensed scene classification","volume":"8","author":"Hu","year":"2015","journal-title":"IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Song, W., Cong, Y., Zhang, Y., and Zhang, S. (2022, January 11\u201313). Wavelet Attention ResNeXt Network for High-resolution Remote Sensing Scene Classification. Proceedings of the 2022 17th International Conference on Control, Automation, Robotics and Vision (ICARCV), Singapore.","DOI":"10.1109\/ICARCV57592.2022.10004315"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"0684","DOI":"10.21123\/bsj.2024.9767","article-title":"Oil spill classification based on satellite image using deep learning techniques","volume":"21","author":"Abba","year":"2024","journal-title":"Baghdad Sci. J."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Saetchnikov, I., Skakun, V., and Tcherniavskaia, E. (2024, January 3\u20135). Aircraft Detection Approach Based on YOLOv9 for High-Resolution Remote Sensing. Proceedings of the 2024 11th International Workshop on Metrology for AeroSpace (MetroAeroSpace), Lublin, Poland.","DOI":"10.1109\/MetroAeroSpace61015.2024.10591528"},{"key":"ref_44","unstructured":"Le, T.D. (2024). On-board satellite image classification for earth observation: A comparative study of pre-trained vision transformer models. arXiv."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"5411415","DOI":"10.1109\/TGRS.2022.3202036","article-title":"A 3-d-swin transformer-based hierarchical contrastive learning method for hyperspectral image classification","volume":"60","author":"Huang","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."}],"container-title":["Journal of Imaging"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2313-433X\/11\/5\/156\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T17:33:25Z","timestamp":1760031205000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2313-433X\/11\/5\/156"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,15]]},"references-count":45,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2025,5]]}},"alternative-id":["jimaging11050156"],"URL":"https:\/\/doi.org\/10.3390\/jimaging11050156","relation":{},"ISSN":["2313-433X"],"issn-type":[{"value":"2313-433X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,5,15]]}}}