{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T02:44:27Z","timestamp":1760237067066,"version":"build-2065373602"},"reference-count":36,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2020,2,25]],"date-time":"2020-02-25T00:00:00Z","timestamp":1582588800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["41771457"],"award-info":[{"award-number":["41771457"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>3D pose estimation is always an active but challenging task for object detection in remote sensing images. In this paper, we present a new algorithm for predicting an object\u2019s 3D pose in remote sensing images, called Anchor Points Prediction (APP). Compared to previous methods, such as RoI Transform, our object results of the final output can obtain direction information. We predict the object\u2019s multiple feature points based on the neural network to obtain the homograph transformation relationship between object coordinates and image coordinates. The resulting 3D pose can accurately describe the three-dimensional position and attitude of the object. At the same time, we redefine the method     I o  U  A P P       for calculating the direction and posture of the object. We tested our algorithm on the HRSC2016 dataset and the DOTA dataset with accuracy rates of 0.863 and 0.701, respectively. The experimental results show that the accuracy of the APP algorithm is significantly improved. At the same time, the algorithm can achieve one-stage prediction, which makes the calculation process easier and more efficient.<\/jats:p>","DOI":"10.3390\/s20051240","type":"journal-article","created":{"date-parts":[[2020,2,25]],"date-time":"2020-02-25T08:12:22Z","timestamp":1582618342000},"page":"1240","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["3D Pose Estimation for Object Detection in Remote Sensing Images"],"prefix":"10.3390","volume":"20","author":[{"given":"Jin","family":"Liu","sequence":"first","affiliation":[{"name":"State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan 430079, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6429-6830","authenticated-orcid":false,"given":"Yongjian","family":"Gao","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan 430079, China"}]}],"member":"1968","published-online":{"date-parts":[[2020,2,25]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"2486","DOI":"10.1109\/TGRS.2016.2645610","article-title":"Accurate Object Localization in Remote Sensing Images Based on Convolutional Neural Networks","volume":"55","author":"Long","year":"2017","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"851","DOI":"10.1109\/LGRS.2017.2683495","article-title":"Feature Extraction by Rotation-Invariant Matrix Representation for Object Detection in Aerial Image","volume":"14","author":"Wang","year":"2017","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"3652","DOI":"10.1109\/JSTARS.2017.2694890","article-title":"Toward Fast and Accurate Vehicle Detection in Aerial Images Using Coupled Region-Based Convolutional Neural Networks","volume":"10","author":"Deng","year":"2017","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 13\u201316). Fast R-CNN. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_5","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, Curran Associates, Inc."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Cai, Z., and Vasconcelos, N. (2018, January 18\u201322). Cascade R-CNN: Delving Into High Quality Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake, UT, USA.","DOI":"10.1109\/CVPR.2018.00644"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 24\u201327). Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Ding, J., Xue, N., Long, Y., Xia, G.S., and Lu, Q. (2019, January 16\u201319). Learning RoI Transformer for Oriented Object Detection in Aerial Images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Los Angeles, CA, USA.","DOI":"10.1109\/CVPR.2019.00296"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Ren, H., El-Khamy, M., and Lee, J. (2018, January 12\u201315). CT-SRCNN: Cascade Trained and Trimmed Deep Convolutional Neural Networks for Image Super Resolution. Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA.","DOI":"10.1109\/WACV.2018.00160"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature Pyramid Networks for Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_11","unstructured":"Etten, A.V. (2018). You Only Look Twice: Rapid Multi-Scale Object Detection In Satellite Imagery, Cornell University."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Redmon, J., and Farhadi, A. (2017, January 21\u201326). YOLO9000: Better, Faster, Stronger. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.690"},{"key":"ref_13","unstructured":"Redmon, J., and Farhadi, A. (2018). YOLOv3: An Incremental Improvement, Cornell University."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1074","DOI":"10.1109\/LGRS.2016.2565705","article-title":"Ship Rotated Bounding Box Space for Ship Extraction From High-Resolution Optical Satellite Images With Complex Backgrounds","volume":"13","author":"Liu","year":"2016","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Liu, Z., Hu, J., Weng, L., and Yang, Y. (2017, January 18\u201320). Rotated region based CNN for ship detection. Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China.","DOI":"10.1109\/ICIP.2017.8296411"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1938","DOI":"10.1109\/LGRS.2015.2439517","article-title":"Fast Multiclass Vehicle Detection on Aerial Images","volume":"12","author":"Liu","year":"2015","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Law, H., and Deng, J. (2018, January 8\u201314). CornerNet: Detecting Objects as Paired Keypoints. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01264-9_45"},{"key":"ref_18","unstructured":"Pereira, F., Burges, C.J.C., Bottou, L., and Weinberger, K.Q. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems 25, Curran Associates, Inc."},{"key":"ref_19","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition, Cornell University."},{"key":"ref_20","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (July, January 26). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016, January 8\u201316). SSD: Single Shot MultiBox Detector. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_22","unstructured":"Zhou, X., Wang, D., and Kr\u00e4henb\u00fchl, P. (2019). Objects as Points, Cornell University."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Rad, M., and Lepetit, V. (2017, January 22\u201329). BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects Without Using Depth. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.413"},{"key":"ref_24","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (July, January 26). You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1499","DOI":"10.1109\/LSP.2016.2603342","article-title":"Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks","volume":"23","author":"Zhang","year":"2016","journal-title":"IEEE Signal Proce. Lett."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Cao, Z., Simon, T., Wei, S.E., and Sheikh, Y. (2017, January 21\u201326). Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.143"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Neubeck, A., and Van Gool, L. (2006, January 20\u201324). Efficient Non-Maximum Suppression. Proceedings of the 18th International Conference on Pattern Recognition (ICPR\u201906), Hong Kong, China.","DOI":"10.1109\/ICPR.2006.479"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","article-title":"The Pascal Visual Object Classes (VOC) Challenge","volume":"88","author":"Everingham","year":"2010","journal-title":"Int. J. Comput. Vis."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Li, Y., Wang, G., Ji, X., Xiang, Y., and Fox, D. (2018, January 8\u201314). DeepIM: Deep Iterative Matching for 6D Pose Estimation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01231-1_42"},{"key":"ref_30","unstructured":"Liu, J., and He, S. (2019). 6D Object Pose Estimation without PnP, Cornell University."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1745","DOI":"10.1109\/LGRS.2018.2856921","article-title":"Toward Arbitrary-Oriented Ship Detection With Rotated Region Proposal and Discrimination Networks","volume":"15","author":"Zhang","year":"2018","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Liao, M., Zhu, Z., Shi, B., Xia, G.S., and Bai, X. (2018, January 18\u201322). Rotation-Sensitive Regression for Oriented Scene Text Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake, UT, USA.","DOI":"10.1109\/CVPR.2018.00619"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Xia, G.S., Bai, X., Ding, J., Zhu, Z., Belongie, S., Luo, J., Datcu, M., Pelillo, M., and Zhang, L. (2018, January 18\u201322). DOTA: A Large-Scale Dataset for Object Detection in Aerial Images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake, UT, USA.","DOI":"10.1109\/CVPR.2018.00418"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"3111","DOI":"10.1109\/TMM.2018.2818020","article-title":"Arbitrary-Oriented Scene Text Detection via Rotation Proposals","volume":"20","author":"Ma","year":"2018","journal-title":"IEEE Trans. Multimed."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Jiang, Y., Zhu, X., Wang, X., Yang, S., Li, W., Wang, H., Fu, P., and Luo, Z. (2017). R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection, Cornell University.","DOI":"10.1109\/ICPR.2018.8545598"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Kehl, W., Manhardt, F., Tombari, F., Ilic, S., and Navab, N. (2017, January 22\u201329). SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.169"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/5\/1240\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T09:01:27Z","timestamp":1760173287000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/5\/1240"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,2,25]]},"references-count":36,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2020,3]]}},"alternative-id":["s20051240"],"URL":"https:\/\/doi.org\/10.3390\/s20051240","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2020,2,25]]}}}