{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,1]],"date-time":"2026-02-01T03:58:27Z","timestamp":1769918307437,"version":"3.49.0"},"reference-count":48,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2021,1,28]],"date-time":"2021-01-28T00:00:00Z","timestamp":1611792000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["41671452"],"award-info":[{"award-number":["41671452"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["41701532"],"award-info":[{"award-number":["41701532"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"China Postdoctoral Science Foundation-funded project","award":["2017M612510"],"award-info":[{"award-number":["2017M612510"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Accurate and efficient text detection in natural scenes is a fundamental yet challenging task in computer vision, especially when dealing with arbitrarily-oriented texts. Most contemporary text detection methods are designed to identify horizontal or approximately horizontal text, which cannot satisfy practical detection requirements for various real-world images such as image streams or videos. To address this lacuna, we propose a novel method called Rotational You Only Look Once (R-YOLO), a robust real-time convolutional neural network (CNN) model to detect arbitrarily-oriented texts in natural image scenes. First, a rotated anchor box with angle information is used as the text bounding box over various orientations. Second, features of various scales are extracted from the input image to determine the probability, confidence, and inclined bounding boxes of the text. Finally, Rotational Distance Intersection over Union Non-Maximum Suppression is used to eliminate redundancy and acquire detection results with the highest accuracy. Experiments on benchmark comparison are conducted upon four popular datasets, i.e., ICDAR2015, ICDAR2013, MSRA-TD500, and ICDAR2017-MLT. The results indicate that the proposed R-YOLO method significantly outperforms state-of-the-art methods in terms of detection efficiency while maintaining high accuracy; for example, the proposed R-YOLO method achieves an F-measure of 82.3% at 62.5 fps with 720 p resolution on the ICDAR2015 dataset.<\/jats:p>","DOI":"10.3390\/s21030888","type":"journal-article","created":{"date-parts":[[2021,1,28]],"date-time":"2021-01-28T09:03:45Z","timestamp":1611824625000},"page":"888","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":35,"title":["R-YOLO: A Real-Time Text Detector for Natural Scenes with Arbitrary Rotation"],"prefix":"10.3390","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3442-2474","authenticated-orcid":false,"given":"Xiqi","family":"Wang","sequence":"first","affiliation":[{"name":"School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shunyi","family":"Zheng","sequence":"additional","affiliation":[{"name":"School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5100-3584","authenticated-orcid":false,"given":"Ce","family":"Zhang","sequence":"additional","affiliation":[{"name":"Lancaster Environment Centre, Lancaster University, Lancaster LA1 4YQ, UK"},{"name":"UK Centre for Ecology &amp; Hydrology, Library Avenue, Bailrigg, Lancaster LA1 4AP, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7858-3160","authenticated-orcid":false,"given":"Rui","family":"Li","sequence":"additional","affiliation":[{"name":"School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Li","family":"Gui","sequence":"additional","affiliation":[{"name":"School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China"},{"name":"Collaborative Innovation Center of Geospatial Technology, Wuhan University, Wuhan 430079, China"},{"name":"School of Electronic Information, Wuhan University, Wuhan 430079, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2021,1,28]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Neumann, L., and Matas, J. (2013, January 3\u20136). Scene text localization and recognition with oriented stroke detection. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Sydney, Australia.","DOI":"10.1109\/ICCV.2013.19"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"800","DOI":"10.1109\/TIP.2010.2070803","article-title":"A hybrid approach to detect and localize texts in natural scene images","volume":"20","author":"Pan","year":"2011","journal-title":"IEEE Trans. Image Process."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"970","DOI":"10.1109\/TPAMI.2013.182","article-title":"Robust text detection in natural scene images","volume":"36","author":"Yin","year":"2014","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Epshtein, B., Ofek, E., and Wexler, Y. (2010, January 13\u201318). Detecting text in natural scenes with stroke width transform. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA.","DOI":"10.1109\/CVPR.2010.5540041"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Huang, W., Qiao, Y., and Tang, X. (2014, January 6\u201312). Robust scene text detection with convolution neural network induced MSER trees. Proceedings of the European Conference on Computer Vision (ECCV), Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10593-2_33"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Huang, W., Lin, Z., Yang, J., and Wang, J. (2013, January 23\u201328). Text localization in natural images using stroke feature transform and text covariance descriptors. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA.","DOI":"10.1109\/ICCV.2013.157"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Zhong, Z., Jin, L., Huang, S., and Feng, Z. (2017, January 5\u20139). DeepText: A new approach for text proposal generation and text detection in natural images. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA.","DOI":"10.1109\/ICASSP.2017.7952348"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Liao, M., Shi, B., Bai, X., Wang, X., and Liu, W. (2016). TextBoxes: A fast text detector with a single deep neural network. arXiv.","DOI":"10.1609\/aaai.v31i1.11196"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Tian, Z., Huang, W., He, T., He, P., and Qiao, Y. (2016). Detecting text in natural image with connectionist text proposal network. arXiv.","DOI":"10.1007\/978-3-319-46484-8_4"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Tian, S., Pan, Y., Huang, C., Lu, S., Yu, K., and Tan, C. (2016). Text flow: A unified text detection system in natural scene images. arXiv.","DOI":"10.1109\/ICCV.2015.528"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Zhang, Z., Shen, W., Yao, C., and Bai, X. (2015, January 7\u201312). Symmetry-based text line detection in natural scenes. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298871"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Gupta, A., Vedaldi, A., and Zisserman, A. (2016, January 27\u201330). Synthetic data for text localisation in natural images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.254"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"2906","DOI":"10.1016\/j.patcog.2015.04.002","article-title":"A robust approach for text detection from natural scene images","volume":"48","author":"Sun","year":"2015","journal-title":"Pattern Recognit."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Deng, D., Liu, H., Li, X., and Cai, D. (2018, January 2\u20137). Pixellink: Detecting scene text via instance segmentation. Proceedings of the 32nd AAAI Conference Artificial Intelligence, New Orleans, LA, USA.","DOI":"10.1609\/aaai.v32i1.12269"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., and Bai, X. (2016, January 27\u201330). Multi-oriented text detection with fully convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.451"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Long, S., Ruan, J., Zhang, W., He, X., Wu, W., and Yao, C. (2018, January 8\u201314). Textsnake: A flexible representation for detecting text of arbitrary shapes. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01216-8_2"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Yang, Q., Cheng, M., Zhou, W., Chen, Y., Qiu, M., and Lin, W. (2018). Inceptext: A new inception-text module with deformable PSROI pooling for multi-oriented scene text detection. arXiv.","DOI":"10.24963\/ijcai.2018\/149"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Shi, B., Bai, X., and Belongie, S. (2017, January 21\u201326). Detecting oriented text in natural images by linking segments. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.371"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., and Liang, J. (2017, January 21\u201326). EAST: An efficient and accurate scene text detector. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.283"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Liao, M., Zhu, Z., Shi, B., Xia, G., and Bai, X. (2018, January 18\u201322). Rotation-sensitive regression for oriented scene text detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00619"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Zhang, C., Liang, B., Huang, Z., En, M., Han, J., Ding, E., and Ding, X. (2019, January 16\u201320). Look more than once: An accurate detector for text of arbitrary shapes. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01080"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Jiang, Y., Zhu, X., Wang, X., Yang, S., Li, W., Wang, H., Fu, P., and Luo, Z. (2018, January 18\u201322). R2CNN: Rotational region CNN for arbitrarily-oriented scene text detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/ICPR.2018.8545598"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"3111","DOI":"10.1109\/TMM.2018.2818020","article-title":"Arbitrary-oriented scene text detection via rotation proposals","volume":"20","author":"Ma","year":"2018","journal-title":"IEEE Trans. Multimed."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"3676","DOI":"10.1109\/TIP.2018.2825107","article-title":"TextBoxes++: A single-shot oriented scene text detector","volume":"27","author":"Liao","year":"2018","journal-title":"IEEE Trans. Image Process."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., and Li, X. (2017, January 22\u201329). Single shot text detector with regional attention. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.331"},{"key":"ref_26","unstructured":"Bochkovskiy, A., and Wang, C. (2020). YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"433","DOI":"10.1007\/s11831-019-09315-1","article-title":"Review of scene text detection and recognition","volume":"27","author":"Lin","year":"2019","journal-title":"Arch. Comput. Methods Eng."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Brisinello, M., Grbi\u0107, R., Vranje\u0161, M., and Vranje\u0161, D. (2019, January 23\u201325). Review on text detection methods on scene images. Proceedings of the 2019 International Symposium ELMAR, Zadar, Croatia.","DOI":"10.1109\/ELMAR.2019.8918680"},{"key":"ref_29","unstructured":"Raisi, Z., Naiel, M.A., Fieguth, P., Wardell, S., and Zelek, J. (2020). Text detection and recognition in the wild: A review. arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Karatzas, D., Shafait, F., Uchida, S., Iwamuram, M., Bigorda, L., Mestre, S., Mas, J., and Mota, D.F. (2013, January 25\u201328). ICDAR 2013 robust reading competition. Proceedings of the 12th International Conference on Document Analysis and Recognition (ICDAR), Washington, DC, USA.","DOI":"10.1109\/ICDAR.2013.221"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanovl, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., and Lu, S. (2015, January 11\u201312). ICDAR2015 competition on robust reading. Proceedings of the 12th International Conference on Document Analysis and Recognition (ICDAR), Oradea, Romania.","DOI":"10.1109\/ICDAR.2015.7333942"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-net: Convolutional networks for biomedical image segmentation. Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), Munich, Germany.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Ye, J., Chen, Z., Liu, J., and Du, B. (2021, January 7\u201315). TextFuseNet: Scene Text Detection with Richer Fused Features. Proceedings of the 29th International Joint Conference on Artificial Intelligence (IJCAI-20), Yokohama, Japan.","DOI":"10.24963\/ijcai.2020\/72"},{"key":"ref_34","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015, January 7\u201312). Faster r-cnn: Towards real-time object detection with region proposal networks. Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, Montreal, QC, Canada."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"107026","DOI":"10.1016\/j.patcog.2019.107026","article-title":"Realtime multi-scale scene text detection with scale-based region proposal network","volume":"98","author":"He","year":"2020","journal-title":"Pattern Recognit."},{"key":"ref_36","unstructured":"Redmon, J., and Farhadi, A. (2018). YOLOv3: An incremental improvement. arXiv."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016, January 8\u201316). SSD: Single shot multibox detector. Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_38","unstructured":"Yao, C., Bai, X., Liu, W., Ma, Y., and Tu, Z. (2012, January 16\u201321). Detecting texts of arbitrary orientations in natural images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., and Chazalon, J. (2017, January 13\u201315). ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. Proceedings of the 12th International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan.","DOI":"10.1109\/ICDAR.2017.237"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"He, W., Zhang, X., Yin, F., and Liu, C. (2017, January 21\u201326). Deep direct regression for multi-oriented scene text detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/ICCV.2017.87"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Wang, Y., Xie, H., Fu, Z., and Zhang, Y. (2019, January 10\u201316). DSRN: A deep scale relationship network for scene text detection. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI), Macao, China.","DOI":"10.24963\/ijcai.2019\/133"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Liu, Y., and Jin, L. (2018). Deep matching prior network: Toward tighter multi-oriented text detection. arXiv.","DOI":"10.1109\/CVPR.2017.368"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"44219","DOI":"10.1109\/ACCESS.2019.2908933","article-title":"FTPN: Scene text detection with feature pyramid based text proposal network","volume":"7","author":"Liu","year":"2019","journal-title":"IEEE Access"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., and Yan, J. (2018, January 18\u201322). FOTS: Fast oriented text spotting with a unified network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, Utah, USA.","DOI":"10.1109\/CVPR.2018.00595"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Lyu, P., Yao, C., Wu, W., Yan, S., and Bai, X. (2018, January 18\u201322). Multi-oriented Scene Text Detection via Corner Localization and Region Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00788"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Baek, Y., Lee, B., Han, D., Yun, S., and Lee, H. (2019, January 16\u201320). Character region awareness for text detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00959"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Xu, Y., Duan, J., Kuang, Z., Yue, X., Sun, H., Guan, Y., and Zhang, W. (2019). Geometry Normalization Networks for Accurate Scene Text Detection. arXiv.","DOI":"10.1109\/ICCV.2019.00923"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Liao, M., Wan, Z., Yao, C., Chen, K., and Bai, X. (2019). Xiang Bai Real-time Scene Text Detection with Differentiable Binarization Dec. arXiv.","DOI":"10.1609\/aaai.v34i07.6812"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/3\/888\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:16:45Z","timestamp":1760159805000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/3\/888"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,1,28]]},"references-count":48,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2021,2]]}},"alternative-id":["s21030888"],"URL":"https:\/\/doi.org\/10.3390\/s21030888","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,1,28]]}}}