{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T01:51:29Z","timestamp":1760233889683,"version":"build-2065373602"},"reference-count":39,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2021,3,1]],"date-time":"2021-03-01T00:00:00Z","timestamp":1614556800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61976010\uff0c61802011"],"award-info":[{"award-number":["61976010\uff0c61802011"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>In object detection of remote sensing images, anchor-free detectors often suffer from false boxes and sample imbalance, due to the use of single oriented features and the key point-based boxing strategy. This paper presents a simple and effective anchor-free approach-RatioNet with less parameters and higher accuracy for sensing images, which assigns all points in ground-truth boxes as positive samples to alleviate the problem of sample imbalance. In dealing with false boxes from single oriented features, global features of objects is investigated to build a novel regression to predict boxes by predicting width and height of objects and corresponding ratios of l_ratio and t_ratio, which reflect the location of objects. Besides, we introduce ratio-center to assign different weights to pixels, which successfully preserves high-quality boxes and effectively facilitates the performance. On the MS-COCO test\u2013dev set, the proposed RatioNet achieves 49.7% AP.<\/jats:p>","DOI":"10.3390\/s21051672","type":"journal-article","created":{"date-parts":[[2021,3,1]],"date-time":"2021-03-01T03:35:40Z","timestamp":1614569740000},"page":"1672","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["RatioNet: Ratio Prediction Network for Object Detection"],"prefix":"10.3390","volume":"21","author":[{"given":"Kuan","family":"Zhao","sequence":"first","affiliation":[{"name":"Department of Information, Beijing University of Technology, Beijing 100124, China"}]},{"given":"Boxuan","family":"Zhao","sequence":"additional","affiliation":[{"name":"Department of Information, Beijing University of Technology, Beijing 100124, China"}]},{"given":"Lifang","family":"Wu","sequence":"additional","affiliation":[{"name":"Department of Information, Beijing University of Technology, Beijing 100124, China"}]},{"given":"Meng","family":"Jian","sequence":"additional","affiliation":[{"name":"Department of Information, Beijing University of Technology, Beijing 100124, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0401-1343","authenticated-orcid":false,"given":"Xu","family":"Liu","sequence":"additional","affiliation":[{"name":"Department of Information, Beijing University of Technology, Beijing 100124, China"}]}],"member":"1968","published-online":{"date-parts":[[2021,3,1]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Bhagavatula, C., Zhu, C., Luu, K., and Savvides, M. (2017, January 22\u201329). Faster than real-time facial alignment: A 3d spatial transformer network approach in unconstrained poses. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.429"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Zheng, Y., Pal, D.K., and Savvides, M. (2018, January 18\u201322). Ring loss: Convex feature normalization for face recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00534"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Doll\u00e1r, P., Wojek, C., Schiele, B., and Perona, P. (2009, January 20\u201325). Pedestrian detection: A benchmark. Proceedings of the IEEE International Conference on Computer Vision, Miami, FL, USA.","DOI":"10.1109\/CVPRW.2009.5206631"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Liang, X., Wang, T., Yang, L., and Xing, E. (2018, January 8\u201314). Cirl: Controllable imitative reinforcement learning for vision-based self-driving. Proceedings of the European Conference on Computer Vision, Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_36"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Bulat, A., and Tzimiropoulos, G. (2016, January 8\u201316). Human pose estimation via convolutional part heatmap regression. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46478-7_44"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., and Sun, J. (2018, January 18\u201322). Cascaded pyramid network for multi-person pose estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00742"},{"key":"ref_7","unstructured":"Fu, C., Liu, W., Ranga, A., Tyagi, A., and Berg, A.C. (2017). DSSD: Deconvolutional Single Shot Detector. arXiv."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast R-CNN. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_9","unstructured":"Yang, Z., Liu, S., Hu, H., Wang, L., and Lin, S. (November, January 27). Reppoints: Point set representation for object detection. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_10","unstructured":"Zhou, X., Wang, D., and Kr\u00e4henb\u00fchl, P. (2019). Objects as points. arXiv."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","article-title":"The pascal visual object classes (voc) challenge","volume":"88","author":"Everingham","year":"2010","journal-title":"Int. J. Comput. Vis."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014, January 6\u201312). Microsoft coco: Common objects in context. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Redmon, J., and Farhadi, A. (2017, January 21\u201326). YOLO9000: Better, Faster, Stronger. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.690"},{"key":"ref_14","unstructured":"Redmon, J., and Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C., and Berg, A.C. (2016, January 8\u201316). SSD: Single Shot MultiBox Detector. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., and Tian, Q. (2019). Centernet: Object detection with keypoint triplets. arXiv.","DOI":"10.1109\/ICCV.2019.00667"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Law, H., and Deng, J. (2018, January 8\u201314). Cornernet: Detecting objects as paired keypoints. Proceedings of the European Conference on Computer Vision, Munich, Germany.","DOI":"10.1007\/978-3-030-01264-9_45"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Zhou, X., Zhuo, J., and Krahenbuhl, P. (2019, January 16\u201320). Bottom-up object detection by grouping extreme and center points. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00094"},{"key":"ref_19","unstructured":"Tian, Z., Shen, C., Chen, H., and He, T. (November, January 27). Fcos: Fully convolutional one-stage object detection. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Hosang, J., Benenson, R., and Schiele, B. (2017, January 21\u201326). Learning non-maximum suppression. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.685"},{"key":"ref_21","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015, January 7\u201312). Faster R-Cnn: Towards real-time object detection with region proposal networks. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 22\u201329). Mask R-Cnn. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Cai, Z., and Vasconcelos, N. (2018, January 18\u201322). Cascade r-cnn: Delving into high quality object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00644"},{"key":"ref_24","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (July, January 26). You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_26","unstructured":"Huang, L., Yang, Y., Deng, Y., and Yu, Y. (2015). Densebox: Unifying landmark localization with end to end object detection. arXiv."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Yu, J., Jiang, Y., Wang, Z., Cao, Z., and Huang, T. (2016, January 15\u201319). Unitbox: An advanced object detection network. Proceedings of the 24th ACM International Conference on Multimedia, Amsterdam, The Netherlands.","DOI":"10.1145\/2964284.2967274"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"315","DOI":"10.1007\/s10032-019-00335-y","article-title":"An anchor-free region proposal network for faster rcnn-based text detection approaches","volume":"22","author":"Zhong","year":"2019","journal-title":"Int. J. Doc. Anal. Recognit. (IJDAR)"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Liu, W., Liao, S., Ren, W., Hu, W., and Yu, Y. (2019, January 16\u201320). High-level semantic feature detection: A new perspective for pedestrian detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00533"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Newell, A., Yang, K., and Deng, J. (2016, January 8\u201316). Stacked hourglass networks for human pose estimation. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46484-8_29"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Zhu, C., He, Y., and Savvides, M. (2019, January 16\u201320). Feature selective anchor-free module for single-shot object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00093"},{"key":"ref_32","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (July, January 26). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Xie, S., Girshick, R., Doll\u00e1r, P., Tu, Z., and He, K. (2017, January 21\u201326). Aggregated residual transformations for deep neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.634"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and Fei-Fei, L. (2009, January 20\u201325). Imagenet: A large-scale hierarchical image database. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., and Wei, Y. (2017, January 22\u201329). Deformable convolutional networks. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.89"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Zhu, X., Hu, H., Lin, S., and Dai, J. (2019, January 16\u201320). Deformable convnets v2: More deformable, better results. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00953"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., and Lin, D. (2019, January 16\u201320). Libra r-cnn: Towards balanced learning for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00091"},{"key":"ref_39","unstructured":"Li, Y., Chen, Y., Wang, N., and Zhang, Z. (November, January 27). Scale-aware trident networks for object detection. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/5\/1672\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:30:44Z","timestamp":1760160644000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/5\/1672"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,3,1]]},"references-count":39,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2021,3]]}},"alternative-id":["s21051672"],"URL":"https:\/\/doi.org\/10.3390\/s21051672","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2021,3,1]]}}}