{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,7]],"date-time":"2026-02-07T11:26:15Z","timestamp":1770463575475,"version":"3.49.0"},"reference-count":40,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2022,4,27]],"date-time":"2022-04-27T00:00:00Z","timestamp":1651017600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"China Postdoctoral Science Foundation","award":["2019M653742"],"award-info":[{"award-number":["2019M653742"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>The complexity of backgrounds, the diversity of object scale and orientation, and the defects of convolutional neural network (CNN) have always been the challenges of oriented object detection in remote sensing images (RSIs). This paper designs a hybrid network model to meet these challenges and further improve the effect of oriented object detection. The inductive bias of CNN makes the network translation invariant, but it is difficult to adapt to RSIs with arbitrary object direction. Therefore, this paper designs a hybrid network, TransConvNet, which integrates the advantages of CNN and self-attention-based network, pays more attention to the aggregation of global and local information, makes up for the lack of rotation invariability of CNN with strong contextual attention, and adapts to the arbitrariness of the object direction of RSIs. In addition, to resolve the influence of complex backgrounds and multi-scale, an adaptive feature fusion network (AFFN) is designed to improve the information representation ability of feature maps with different resolutions. Finally, the adaptive weight loss function is used to train the network to further improve the effect of object detection. Extensive experimental results on the DOTA, UCASAOD, and VEDAI data sets demonstrate the effectiveness of the proposed method.<\/jats:p>","DOI":"10.3390\/rs14092090","type":"journal-article","created":{"date-parts":[[2022,4,27]],"date-time":"2022-04-27T22:20:20Z","timestamp":1651098020000},"page":"2090","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":22,"title":["Hybrid Network Model: TransConvNet for Oriented Object Detection in Remote Sensing Images"],"prefix":"10.3390","volume":"14","author":[{"given":"Xulun","family":"Liu","sequence":"first","affiliation":[{"name":"Aviation Engineering School, Air Force Engineering University, Xi\u2019an 710038, China"}]},{"given":"Shiping","family":"Ma","sequence":"additional","affiliation":[{"name":"Aviation Engineering School, Air Force Engineering University, Xi\u2019an 710038, China"}]},{"given":"Linyuan","family":"He","sequence":"additional","affiliation":[{"name":"Aviation Engineering School, Air Force Engineering University, Xi\u2019an 710038, China"},{"name":"Unbanned System Research Institute, Northwestern Polytechnical University, Xi\u2019an 710072, China"}]},{"given":"Chen","family":"Wang","sequence":"additional","affiliation":[{"name":"Aviation Engineering School, Air Force Engineering University, Xi\u2019an 710038, China"}]},{"given":"Zhe","family":"Chen","sequence":"additional","affiliation":[{"name":"National Engineering Laboratory for Wireless Security, Xi\u2019an University of Posts and Telecommunications, Xi\u2019an 710121, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,4,27]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Yang, F., Fan, H., Chu, P., Blasch, E., and Ling, H. (2019, January 27\u201328). Clustered object detection in aerial images. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00840"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"681","DOI":"10.1109\/LGRS.2019.2930462","article-title":"Multi-scale spatial and channel-wise attention for improving object detection in remote sensing imagery","volume":"17","author":"Chen","year":"2019","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_3","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv."},{"key":"ref_4","first-page":"8004405","article-title":"FRPNet: A feature-reflowing pyramid network for object detection of remote sensing images","volume":"19","author":"Wang","year":"2020","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1745","DOI":"10.1109\/LGRS.2018.2856921","article-title":"Toward arbitrary-oriented ship detection with rotated region proposal and discrimination networks","volume":"15","author":"Zhang","year":"2018","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"10015","DOI":"10.1109\/TGRS.2019.2930982","article-title":"Cad-net: A context-aware detection network for objects in remote sensing imagery","volume":"57","author":"Zhang","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Yang, X., Yang, J., Yan, J., Zhang, Y., Zhang, T., Guo, Z., Sun, X., and Fu, K. (2019, January 23\u201325). Scrdet: Towards more robust detection for small, cluttered and rotated objects. Proceedings of the IEEE International Conference on Computer Vision, Thessaloniki, Greece.","DOI":"10.1109\/ICCV.2019.00832"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Ding, J., Xue, N., Long, Y., Xia, G.S., and Lu, Q. (2019, January 16\u201320). Learning RoI transformer for oriented object detection in aerial images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00296"},{"key":"ref_9","unstructured":"D\u2019Ascoli, S., Touvron, H., Leavitt, M.L., Morcos, A.S., Biroli, G., and Sagun, L. (2021, January 18\u201324). Convit: Improving vision transformers with soft convolutional inductive biases. Proceedings of the International Conference on Machine Learning, PMLR, Virtual."},{"key":"ref_10","first-page":"5998","article-title":"Attention is all you need","volume":"3","author":"Vaswani","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_11","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16 \u00d7 16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. (2020, January 23\u201328). End-to-end object detection with transformers. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 10\u201317). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Liu, S., Qi, L., Qin, H., Shi, J., and Jia, J. (2018, January 18\u201322). Path aggregation network for instance segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00913"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201322). Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.-Y., and Kweon, I.S. (2018, January 8\u201314). Cbam: Convolutional block attention module. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Xia, G.-S., Bai, X., Ding, J., Zhu, Z., Belongie, S., Luo, J., Datcu, M., Pelillo, M., and Zhang, L. (2018, January 18\u201322). DOTA: A large-scale dataset for object detection in aerial images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00418"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Zhu, H., Chen, X., Dai, W., Fu, K., Ye, Q., and Jiao, J. (2015, January 27\u201330). Orientation robust object detection in aerial images using deep convolutional neural network. Proceedings of the 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, Canada.","DOI":"10.1109\/ICIP.2015.7351502"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1016\/j.jvcir.2015.11.002","article-title":"Vehicle detection in aerial imagery: A small target detection benchmark","volume":"34","author":"Razakarivony","year":"2016","journal-title":"J. Vis. Commun. Image Represent."},{"key":"ref_21","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_23","unstructured":"Tan, M., and Le, Q. (2019, January 10\u201315). Efficientnet: Rethinking model scaling for convolutional neural networks. Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA."},{"key":"ref_24","unstructured":"Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv."},{"key":"ref_25","unstructured":"Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., and J\u00e9gou, H. (2021, January 18\u201324). Training data-efficient image transformers distillation through attention. Proceedings of the International Conference on Machine Learning, PMLR, Virtual."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Yuan, L., Chen, Y., Wang, T., Yu, W., Shi, Y., Jiang, Z.-H., Tay, F.E., Feng, J., and Yan, S. (2021, January 18\u201324). Tokens-to-token vit: Training vision transformers from scratch on imagenet. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00060"},{"key":"ref_27","unstructured":"Zhu, X., Su, W., Lu, L., Li, B., Wang, X., and Dai, J. (2020). Deformable detr: Deformable transformers for end-to-end object detection. arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016, January 8\u201316). Ssd: Single shot multibox detector. Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherland.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Cai, Z., Fan, Q., Feris, R.S., and Vasconcelos, N. (2016, January 8\u201316). A unified multi-scale deep convolutional neural network for fast object detection. Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherland.","DOI":"10.1007\/978-3-319-46493-0_22"},{"key":"ref_30","unstructured":"Zhao, Q., Sheng, T., Wang, Y., Tang, Z., Chen, Y., Cai, L., and Ling, H. (February, January 27). M2det: A single-shot object detector based on multi-level feature pyramid network. Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA."},{"key":"ref_31","unstructured":"Liu, S., Huang, D., and Wang, Y. (2019). Learning spatial fusion for single-shot object detection. arXiv."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Tian, Z., Shen, C., Chen, H., and He, T. (2019, January 23\u201325). Fcos: Fully convolutional one-stage object detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Thessaloniki, Greece.","DOI":"10.1109\/ICCV.2019.00972"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_34","unstructured":"Kendall, A., Gal, Y., and Cipolla, R. (2018, January 18\u201322). Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA."},{"key":"ref_35","unstructured":"Kingma, D.P., and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv."},{"key":"ref_36","unstructured":"Yang, X., Liu, Q., Yan, J., Li, A., Zhang, Z., and Yu, G. (2019). R3det: Refined single-stage detector with feature refinement for rotating object. arXiv."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"1452","DOI":"10.1109\/TPAMI.2020.2974745","article-title":"Gliding vertex on the horizontal bounding box for multi-oriented object detection","volume":"43","author":"Xu","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Yi, J., Wu, P., Liu, B., Huang, Q., Qu, H., and Metaxas, D. (2020, January 1\u20135). Oriented object detection in aerial images with box boundary-aware vectors. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA.","DOI":"10.1109\/WACV48630.2021.00220"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Yang, X., and Yan, J. (2020, January 23\u201328). Arbitrary-oriented object detection with circular smooth label. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58598-3_40"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"3111","DOI":"10.1109\/TMM.2018.2818020","article-title":"Arbitrary-oriented scene text detection via rotation proposals","volume":"20","author":"Ma","year":"2018","journal-title":"IEEE Trans. Multimed."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/9\/2090\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:01:52Z","timestamp":1760137312000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/9\/2090"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4,27]]},"references-count":40,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2022,5]]}},"alternative-id":["rs14092090"],"URL":"https:\/\/doi.org\/10.3390\/rs14092090","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,4,27]]}}}