{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,21]],"date-time":"2026-03-21T06:19:19Z","timestamp":1774073959389,"version":"3.50.1"},"reference-count":42,"publisher":"MDPI AG","issue":"13","license":[{"start":{"date-parts":[[2023,6,25]],"date-time":"2023-06-25T00:00:00Z","timestamp":1687651200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100003725","name":"National Research Foundation of Korea","doi-asserted-by":"publisher","award":["NRF-2018R1D1A3B05049058"],"award-info":[{"award-number":["NRF-2018R1D1A3B05049058"]}],"id":[{"id":"10.13039\/501100003725","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Detecting dense text in scene images is a challenging task due to the high variability, complexity, and overlapping of text areas. To adequately distinguish text instances with high density in scenes, we propose an efficient approach called DenseTextPVT. We first generated high-resolution features at different levels to enable accurate dense text detection, which is essential for dense prediction tasks. Additionally, to enhance the feature representation, we designed the Deep Multi-scale Feature Refinement Network (DMFRN), which effectively detects texts of varying sizes, shapes, and fonts, including small-scale texts. DenseTextPVT, then, is inspired by Pixel Aggregation (PA) similarity vector algorithms to cluster text pixels into correct text kernels in the post-processing step. In this way, our proposed method enhances the precision of text detection and effectively reduces overlapping between text regions under dense adjacent text in natural images. The comprehensive experiments indicate the effectiveness of our method on the TotalText, CTW1500, and ICDAR-2015 benchmark datasets in comparison to existing methods.<\/jats:p>","DOI":"10.3390\/s23135889","type":"journal-article","created":{"date-parts":[[2023,6,26]],"date-time":"2023-06-26T05:28:02Z","timestamp":1687757282000},"page":"5889","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["DenseTextPVT: Pyramid Vision Transformer with Deep Multi-Scale Feature Refinement Network for Dense Text Detection"],"prefix":"10.3390","volume":"23","author":[{"given":"My-Tham","family":"Dinh","sequence":"first","affiliation":[{"name":"Department of Artificial Intelligence Convergence, Chonnam National University, 77 Yongbong-ro, Gwangju 500-757, Republic of Korea"}]},{"given":"Deok-Jai","family":"Choi","sequence":"additional","affiliation":[{"name":"Department of Artificial Intelligence Convergence, Chonnam National University, 77 Yongbong-ro, Gwangju 500-757, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8756-1382","authenticated-orcid":false,"given":"Guee-Sang","family":"Lee","sequence":"additional","affiliation":[{"name":"Department of Artificial Intelligence Convergence, Chonnam National University, 77 Yongbong-ro, Gwangju 500-757, Republic of Korea"}]}],"member":"1968","published-online":{"date-parts":[[2023,6,25]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Wang, W., Xie, E., Song, X., Zang, Y., Wang, W., Lu, T., Yu, G., and Shen, C. (2019, January 19\u201320). Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Long Beach, CA, USA.","DOI":"10.1109\/ICCV.2019.00853"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"919","DOI":"10.1109\/TPAMI.2022.3155612","article-title":"Real-time scene text detection with differentiable binarization and adaptive scale fusion","volume":"45","author":"Liao","year":"2022","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_3","first-page":"2736","article-title":"Arbitrary Shape Text Detection via Segmentation with Probability Map","volume":"45","author":"Zhang","year":"2022","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Tang, J., Zhang, W., Liu, H., Yang, M., Jiang, B., Hu, G., and Bai, X. (2022, January 20\u201325). Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Virtual.","DOI":"10.1109\/CVPR52688.2022.00452"},{"key":"ref_5","first-page":"970","article-title":"Robust text detection in natural scene images","volume":"36","author":"Yin","year":"2013","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_6","unstructured":"Chen, Z., Wang, J., Wang, W., Chen, G., Xie, E., Luo, P., and Lu, T. (2021). FAST: Searching for a Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation. arXiv."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Wang, W., Xie, E., Li, X., Hou, W., Lu, T., Yu, G., and Shao, S. (2019, January 19\u201320). Shape robust text detection with progressive scale expansion network. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00956"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., and Liang, J. (2017, January 21\u201326). East: An efficient and accurate scene text detector. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.283"},{"key":"ref_9","unstructured":"Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., and Torralba, A. (July, January 26). Learning deep features for discriminative localization. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Dai, P., Zhang, S., Zhang, H., and Cao, X. (2021, January 20\u201325). Progressive contour regression for arbitrary-shape scene text detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Virtual.","DOI":"10.1109\/CVPR46437.2021.00731"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Baek, Y., Lee, B., Han, D., Yun, S., and Lee, H. (2019, January 15\u201320). Character region awareness for text detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00959"},{"key":"ref_12","first-page":"335","article-title":"Centripetaltext: An efficient text instance representation for scene text detection","volume":"34","author":"Sheng","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Shi, B., Xiang, B., and Serge, B. (2017, January 21\u201326). Detecting oriented text in natural images by linking segments. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.371"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Zhang, C., Borong, L., Zuming, H., Mengyi, E., Junyu, H., Errui, D., and Xinghao, D. (2019, January 15\u201320). Look more than once: An accurate detector for text of arbitrary shapes. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01080"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"He, W., Zhang, X.-Y., Yin, F., and Liu, C.-L. (2017, January 22\u201329). Deep direct regression for multi-oriented scene text detection. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.87"},{"key":"ref_16","unstructured":"Kheng, C.C., and Chan, C.S. (2017). Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 9\u201315 November 2017, IEEE."},{"key":"ref_17","unstructured":"Liu, Y., Jin, L., Zhang, S., and Zhang, S. (2017). Detecting curve text in the wild: New dataset and new solution. arXiv."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., and Lu, S. (2015, January 23\u201326). ICDAR 2015 competition on robust reading. Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia.","DOI":"10.1109\/ICDAR.2015.7333942"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Xue, C., Shijian, L., and Wei, Z. (2019). MSR: Multi-scale shape regression for scene text detection. arXiv.","DOI":"10.24963\/ijcai.2019\/139"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Long, S., Jiaqiang, R., Wenjie, Z., Xin, H., Wenhao, W., and Cong, Y. (2018, January 8\u201314). Textsnake: A flexible representation for detecting text of arbitrary shapes. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01216-8_2"},{"key":"ref_21","first-page":"5998","article-title":"Attention is all you need","volume":"l30","author":"Vaswani","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_22","unstructured":"Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. (2020). Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, UK, 23\u201328 August 2020, Springer International Publishing."},{"key":"ref_23","unstructured":"Ze, L., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 20\u201325). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Virtual."},{"key":"ref_24","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2021, January 3\u20137). An image is worth 16 \u00d7 16 words: Transformers for image recognition at scale. Proceedings of the International Conference on Learning Representations, Virtual Event."},{"key":"ref_25","first-page":"10347","article-title":"Training data-efficient image transformers & distillation through attention","volume":"139","author":"Hugo","year":"2021","journal-title":"Int. Conf. Mach. Learn."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"415","DOI":"10.1007\/s41095-022-0274-8","article-title":"Pvt v2: Improved baselines with pyramid vision transformer","volume":"8","author":"Wang","year":"2022","journal-title":"Comput. Vis. Media"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.-Y., and Kweon, I.S. (2018, January 8\u201314). Cbam: Convolutional block attention module. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., and Zhang, W. (2021, January 20\u201325). Fourier contour embedding for arbitrary-shaped text detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Virtual.","DOI":"10.1109\/CVPR46437.2021.00314"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Liao, M., Wan, Z., Yao, C., Chen, K., and Bai, X. (2020, January 7\u201312). Real-time scene text detection with differentiable binarization. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i07.6812"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Wang, F., Chen, Y., Wu, F., and Li, X. (2020, January 12\u201316). Textray: Contour-based geometric modeling for arbitrary-shaped scene text detection. Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA.","DOI":"10.1145\/3394171.3413819"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"36924","DOI":"10.1109\/ACCESS.2021.3062904","article-title":"Document image binarization with stroke boundary feature guided network","volume":"9","author":"Dang","year":"2021","journal-title":"IEEE Access"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"102106","DOI":"10.1109\/ACCESS.2020.2999069","article-title":"Arbitrary-shaped text detection with adaptive text region representation","volume":"8","author":"Jiang","year":"2020","journal-title":"IEEE Access"},{"key":"ref_33","unstructured":"Zobeir, R., Naiel, M.A., Younes, G., Wardell, S., and Zelek, J.S. (2021, January 20\u201325). Transformer-based text detection in the wild. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Virtual."},{"key":"ref_34","unstructured":"Zobeir, R., Younes, G., and Zelek, J. (2022, January 21\u201325). Arbitrary shape text detection using transformers. Proceedings of the 2022 26th International Conference on Pattern Recognition (ICPR), Montreal, QC, Canada."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Wang, W., Xie, E., Li, X., Fan, D., Song, K., Liang, D., Lu, T., Luo, P., and Shao, L. (2021, January 20\u201325). Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Virtual.","DOI":"10.1109\/ICCV48922.2021.00061"},{"key":"ref_36","unstructured":"Dinh, M.-T., and Lee, G.-S. (July, January 29). Arbitrary-shaped Scene Text Detection based on Multi-scale Feature Enhancement Network. Proceedings of the Korean Information Science Society Conference, Jeju, Korea."},{"key":"ref_37","unstructured":"Sudre, C.H., Li, W., Vercauteren, T., Ourselin, S., and Cardoso, M.J. (2017). Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: Third International Workshop, DLMIA 2017, and 7th International Workshop, ML-CDS 2017, Held in Conjunction with MICCAI 2017, Qu\u00e9bec City, QC, Canada, 14 September 2017, Springer International Publishing."},{"key":"ref_38","unstructured":"Loshchilov, I., and Hutter, F. (2017). Decoupled weight decay regularization. arXiv."},{"key":"ref_39","unstructured":"Shrivastava, A., Gupta, A., and Girshick, R. (July, January 26). Training region-based object detectors with online hard example mining. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_40","unstructured":"Enze, X., Zang, Y., Shao, S., Yu, G., Yao, C., and Li, G. (February, January 27). Scene text detection with supervised pyramid context network. Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Lin, J., Jiang, J., Yan, Y., Guo, C., Wang, H., Liu, W., and Wang, H. (2022). DPTNet: A Dual-Path Transformer Architecture for Scene Text Detection. arXiv.","DOI":"10.1109\/ICASSP49357.2023.10094842"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Deng, D., Liu, H., Li, X., and Cai, D. (2018, January 2\u20133). Pixellink: Detecting scene text via instance segmentation. Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA.","DOI":"10.1609\/aaai.v32i1.12269"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/13\/5889\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T20:00:30Z","timestamp":1760126430000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/13\/5889"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,25]]},"references-count":42,"journal-issue":{"issue":"13","published-online":{"date-parts":[[2023,7]]}},"alternative-id":["s23135889"],"URL":"https:\/\/doi.org\/10.3390\/s23135889","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,6,25]]}}}