{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,14]],"date-time":"2026-02-14T20:52:03Z","timestamp":1771102323219,"version":"3.50.1"},"reference-count":57,"publisher":"MDPI AG","issue":"24","license":[{"start":{"date-parts":[[2022,12,14]],"date-time":"2022-12-14T00:00:00Z","timestamp":1670976000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["42171458"],"award-info":[{"award-number":["42171458"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["2021JJ30818"],"award-info":[{"award-number":["2021JJ30818"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["2019QJCZ006"],"award-info":[{"award-number":["2019QJCZ006"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Natural Science Foundation of Hunan Province, China","award":["42171458"],"award-info":[{"award-number":["42171458"]}]},{"name":"Natural Science Foundation of Hunan Province, China","award":["2021JJ30818"],"award-info":[{"award-number":["2021JJ30818"]}]},{"name":"Natural Science Foundation of Hunan Province, China","award":["2019QJCZ006"],"award-info":[{"award-number":["2019QJCZ006"]}]},{"name":"Supported by the Young Teacher Development Program of Changsha University of Science and Technology","award":["42171458"],"award-info":[{"award-number":["42171458"]}]},{"name":"Supported by the Young Teacher Development Program of Changsha University of Science and Technology","award":["2021JJ30818"],"award-info":[{"award-number":["2021JJ30818"]}]},{"name":"Supported by the Young Teacher Development Program of Changsha University of Science and Technology","award":["2019QJCZ006"],"award-info":[{"award-number":["2019QJCZ006"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>The detection of arbitrarily rotated objects in aerial images is challenging due to the highly complex backgrounds and the multiple angles of objects. Existing detectors are not robust relative to the varying angle of objects because the CNNs do not explicitly model the orientation\u2019s variation. In this paper, we propose an Orientation Robust Detector (OrtDet) to solve this problem, which aims to learn features that change accordingly with the object\u2019s rotation (i.e., rotation-equivariant features). Specifically, we introduce a vision transformer as the backbone to capture its remote contextual associations via the degree of feature similarities. By capturing the features of each part of the object and their relative spatial distribution, OrtDet can learn features that have a complete response to any direction of the object. In addition, we use the tokens concatenation layer (TCL) strategy, which generates a pyramidal feature hierarchy for addressing vastly different scales of objects. To avoid the confusion of angle regression, we predict the relative gliding offsets of the vertices in each corresponding side of the horizontal bounding boxes (HBBs) to represent the oriented bounding boxes (OBBs). To intuitively reflect the robustness of the detector, a new metric, the mean rotation precision (mRP), is proposed to quantitatively measure the model\u2019s learning ability for a rotation-equivariant feature. Experiments on the DOTA-v1.0, DOTA-v1.5, and HRSC2016 datasets show that our method improves the mAP by 0.5, 1.1, and 2.2 and reduces mRP detection fluctuations by 0.74, 0.56, and 0.52, respectively.<\/jats:p>","DOI":"10.3390\/rs14246329","type":"journal-article","created":{"date-parts":[[2022,12,14]],"date-time":"2022-12-14T05:59:40Z","timestamp":1670997580000},"page":"6329","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["OrtDet: An Orientation Robust Detector via Transformer for Object Detection in Aerial Images"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6103-1113","authenticated-orcid":false,"given":"Ling","family":"Zhao","sequence":"first","affiliation":[{"name":"School of Geosciences and Info-Physics, Central South University, South Lushan Road, Changsha 410083, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tianhua","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Geosciences and Info-Physics, Central South University, South Lushan Road, Changsha 410083, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shuchun","family":"Xie","sequence":"additional","affiliation":[{"name":"School of Traffic and Transportation Engineering, Changsha University of Science & Technology, Changsha 410114, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1675-2170","authenticated-orcid":false,"given":"Haoze","family":"Huang","sequence":"additional","affiliation":[{"name":"School of Geosciences and Info-Physics, Central South University, South Lushan Road, Changsha 410083, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7948-579X","authenticated-orcid":false,"given":"Ji","family":"Qi","sequence":"additional","affiliation":[{"name":"School of Geosciences and Info-Physics, Central South University, South Lushan Road, Changsha 410083, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,12,14]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"296","DOI":"10.1016\/j.isprsjprs.2019.11.023","article-title":"Object detection in optical remote sensing images: A survey and a new benchmark","volume":"159","author":"Li","year":"2020","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1016\/j.isprsjprs.2016.03.014","article-title":"A survey on object detection in optical remote sensing images","volume":"117","author":"Cheng","year":"2016","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Wu, J., Cao, C., Zhou, Y., Zeng, X., Feng, Z., Wu, Q., and Huang, Z. (2021). Multiple Ship Tracking in Remote Sensing Images Using Deep Learning. Remote Sens., 13.","DOI":"10.3390\/rs13183601"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Ding, J., Xue, N., Long, Y., Xia, G.S., and Lu, Q. (2019, January 15\u201320). Learning roi transformer for oriented object detection in aerial images. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00296"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Azimi, S.M., Vig, E., Bahmanyar, R., K\u00f6rner, M., and Reinartz, P. (2018, January 2\u20136). Towards multi-class object detection in unconstrained remote sensing imagery. Proceedings of the Asian Conference on Computer Vision, Perth, Australia.","DOI":"10.1007\/978-3-030-20893-6_10"},{"key":"ref_6","unstructured":"Yang, X., Yang, J., Yan, J., Zhang, Y., Zhang, T., Guo, Z., Sun, X., and Fu, K.S. (November, January 27). Towards more robust detection for small, cluttered and rotated objects. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"268","DOI":"10.1016\/j.isprsjprs.2020.09.022","article-title":"Oriented objects as pairs of middle lines","volume":"169","author":"Wei","year":"2020","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_8","first-page":"3163","article-title":"R3det: Refined single-stage detector with feature refinement for rotating object","volume":"35","author":"Yang","year":"2021","journal-title":"Proc. Aaai Conf. Artif. Intell."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Kalra, A., Stoppi, G., Brown, B., Agarwal, R., and Kadambi, A. (2021, January 10\u201317). Towards Rotation Invariance in Object Detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00351"},{"key":"ref_10","first-page":"5602511","article-title":"Align deep features for oriented object detection","volume":"60","author":"Han","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_11","first-page":"8021505","article-title":"Optimization for arbitrary-oriented object detection via representation invariance loss","volume":"19","author":"Ming","year":"2021","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Buslaev, A., Iglovikov, V.I., Khvedchenya, E., Parinov, A., Druzhinin, M., and Kalinin, A.A. (2020). Albumentations: Fast and flexible image augmentations. Information, 11.","DOI":"10.3390\/info11020125"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Han, J., Ding, J., Xue, N., and Xia, G.S. (2021, January 20\u201325). Redet: A rotation-equivariant detector for aerial object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00281"},{"key":"ref_14","unstructured":"Cohen, T., and Welling, M. (2016, January 20\u201322). Group equivariant convolutional networks. Proceedings of the International Conference on Machine Learning, PMLR, New York, NY, USA."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Worrall, D.E., Garbin, S.J., Turmukhambetov, D., and Brostow, G.J. (2017, January 21\u201326). Harmonic networks: Deep translation and rotation equivariance. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.758"},{"key":"ref_16","unstructured":"Cohen, T.S., Geiger, M., K\u00f6hler, J., and Welling, M. (May, January 30). Spherical CNNs. Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada."},{"key":"ref_17","unstructured":"Hoogeboom, E., Peters, J.W., Cohen, T.S., and Welling, M. (2018). Hexaconv. arXiv."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1109\/TIP.2018.2867198","article-title":"Learning rotation-invariant and fisher discriminative convolutional neural networks for object detection","volume":"28","author":"Cheng","year":"2018","journal-title":"IEEE Trans. Image Process."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1452","DOI":"10.1109\/TPAMI.2020.2974745","article-title":"Gliding vertex on the horizontal bounding box for multi-oriented object detection","volume":"43","author":"Xu","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Yang, X., and Yan, J. (2020, January 23\u201328). Arbitrary-oriented object detection with circular smooth label. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58598-3_40"},{"key":"ref_21","first-page":"5612414","article-title":"Arbitrary-oriented ship detection through center-head point extraction","volume":"60","author":"Zhang","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_22","unstructured":"Shen, Z., Liu, J., He, Y., Zhang, X., Xu, R., Yu, H., and Cui, P. (2021). Towards out-of-distribution generalization: A survey. arXiv."},{"key":"ref_23","first-page":"5692","article-title":"Stable learning via sample reweighting","volume":"34","author":"Shen","year":"2020","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_24","first-page":"4485","article-title":"Stable prediction with model misspecification and agnostic distribution shift","volume":"34","author":"Kuang","year":"2020","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_25","unstructured":"Dosovitskiy, A., Beyer, L., and Kolesnikov, A. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_26","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141, and Polosukhin, I. (2017, January 4\u20139). Attention is all you need. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Li, Y., Mao, H., Girshick, R., and He, K. (2022). Exploring plain vision transformer backbones for object detection. arXiv.","DOI":"10.1007\/978-3-031-20077-9_17"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 10\u201317). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016, January 11\u201314). Ssd: Single shot multibox detector. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_31","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015, January 7\u201312). Faster r-cnn: Towards real-time object detection with region proposal networks. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. (2020, January 23\u201328). End-to-end object detection with transformers. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"ref_35","unstructured":"Zhou, X., Wang, D., and Kr\u00e4henb\u00fchl, P. (2019). Objects as points. arXiv."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Xia, G.S., Bai, X., Ding, J., Zhu, Z., Belongie, S., Luo, J., Datcu, M., Pelillo, M., and Zhang, L. (2018, January 18\u201323). DOTA: A large-scale dataset for object detection in aerial images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00418"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"3111","DOI":"10.1109\/TMM.2018.2818020","article-title":"Arbitrary-oriented scene text detection via rotation proposals","volume":"20","author":"Ma","year":"2018","journal-title":"IEEE Trans. Multimed."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"1745","DOI":"10.1109\/LGRS.2018.2856921","article-title":"Toward arbitrary-oriented ship detection with rotated region proposal and discrimination networks","volume":"15","author":"Zhang","year":"2018","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_39","unstructured":"Liu, L., Pan, Z., and Lei, B. (2017). Learning a rotation invariant detector with rotatable bounding box. arXiv."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"223373","DOI":"10.1109\/ACCESS.2020.3041025","article-title":"Arbitrary-oriented object detection in remote sensing images based on polar coordinates","volume":"8","author":"Zhou","year":"2020","journal-title":"IEEE Access"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Dong, Z., Wang, M., Wang, Y., Liu, Y., Feng, Y., and Xu, W. (2022). Multi-Oriented Object Detection in High-Resolution Remote Sensing Imagery Based on Convolutional Neural Networks with Adaptive Object Orientation Features. Remote Sens., 14.","DOI":"10.3390\/rs14040950"},{"key":"ref_42","unstructured":"Jaderberg, M., Simonyan, K., Zisserman, A., and Kavukcuoglu, K. (2015, January 7\u201312). Spatial transformer networks. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., and Wei, Y. (2017, January 10\u201317). Deformable convolutional networks. Proceedings of the IEEE International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV.2017.89"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"7405","DOI":"10.1109\/TGRS.2016.2601622","article-title":"Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images","volume":"54","author":"Cheng","year":"2016","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"5146","DOI":"10.1109\/TGRS.2019.2897139","article-title":"ORSIm detector: A novel object detection framework in optical remote sensing imagery using spatial-frequency channel features","volume":"57","author":"Wu","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_46","unstructured":"Zhu, X., Su, W., and Lu, L. (2020). Deformable detr: Deformable transformers for end-to-end object detection. arXiv."},{"key":"ref_47","unstructured":"Zheng, M., Gao, P., Zhang, R., Li, K., Wang, X., Li, H., and Dong, H. (2020). End-to-end object detection with adaptive clustering transformer. arXiv."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Gao, P., Zheng, M., Wang, X., Dai, J., and Li, H. (2021, January 10\u201317). Fast convergence of detr with spatially modulated co-attention. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00360"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Li, Q., Chen, Y., and Zeng, Y. (2022). Transformer with Transfer CNN for Remote-Sensing-Image Object Detection. Remote Sens., 14.","DOI":"10.3390\/rs14040984"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Xu, X., Feng, Z., Cao, C., Li, M., Wu, J., Wu, Z., Shang, Y., and Ye, S. (2021). An improved swin transformer-based model for remote sensing object detection and instance segmentation. Remote Sens., 13.","DOI":"10.3390\/rs13234779"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Panboonyuen, T., Jitkajornwanich, K., Lawawirojwong, S., Srestasathiern, P., and Vateekul, P. (2021). Transformer-Based Decoder Designs for Semantic Segmentation on Remotely Sensed Images. Remote Sens., 13.","DOI":"10.3390\/rs13245100"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Xia, R., Chen, J., Huang, Z., Wan, H., Wu, B., Sun, L., Yao, B., Xiang, H., and Xing, M. (2022). CRTransSar: A Visual Transformer Based on Contextual Joint Representation Learning for SAR Ship Detection. Remote Sens., 14.","DOI":"10.3390\/rs14061488"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Zhang, J., Zhao, H., and Li, J. (2021). TRS: Transformers for remote sensing scene classification. Remote Sens., 13.","DOI":"10.3390\/rs13204143"},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"324","DOI":"10.5220\/0006120603240331","article-title":"A high resolution optical satellite image dataset for ship recognition and some new baselines","volume":"Volume 2","author":"Liu","year":"2017","journal-title":"Proceedings of the International Conference on Pattern Recognition Applications and Methods"},{"key":"ref_55","unstructured":"Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., and Xu, J. (2019). MMDetection: Open mmlab detection toolbox and benchmark. arXiv."},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 22\u201329). Mask r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_57","unstructured":"Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., and Ouyang, W. (November, January 27). Hybrid task cascade for instance segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seoul, Republic of Korea."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/24\/6329\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:41:09Z","timestamp":1760146869000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/24\/6329"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,14]]},"references-count":57,"journal-issue":{"issue":"24","published-online":{"date-parts":[[2022,12]]}},"alternative-id":["rs14246329"],"URL":"https:\/\/doi.org\/10.3390\/rs14246329","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,12,14]]}}}