{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,27]],"date-time":"2025-11-27T13:58:17Z","timestamp":1764251897731,"version":"build-2065373602"},"reference-count":30,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2024,8,14]],"date-time":"2024-08-14T00:00:00Z","timestamp":1723593600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Ministry of Education of the People\u2019s Republic of China, University-Industry Collaborative Education Program","award":["230804602282155"],"award-info":[{"award-number":["230804602282155"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Historical handwritten manuscripts pose challenges to automated recognition techniques due to their unique handwriting styles and cultural backgrounds. In order to solve the problems of complex text word misdetection, omission, and insufficient detection of wide-pitch curved text, this study proposes a high-precision text detection method based on improved YOLOv8s. Firstly, the Swin Transformer is used to replace C2f at the end of the backbone network to solve the shortcomings of fine-grained information loss and insufficient learning features in text word detection. Secondly, the Dysample (Dynamic Upsampling Operator) method is used to retain more detailed features of the target and overcome the shortcomings of information loss in traditional upsampling to realize the text detection task for dense targets. Then, the LSK (Large Selective Kernel) module is added to the detection head to dynamically adjust the feature extraction receptive field, which solves the cases of extreme aspect ratio words, unfocused small text, and complex shape text in text detection. Finally, in order to overcome the CIOU (Complete Intersection Over Union) loss in target box regression with unclear aspect ratio, insensitive to size change, and insufficient correlation between target coordinates, Gaussian Wasserstein Distance (GWD) is introduced to modify the regression loss to measure the similarity between the two bounding boxes in order to obtain high-quality bounding boxes. Compared with the State-of-the-Art methods, the proposed method achieves optimal performance in text detection, with the precision and mAP@0.5 reaching 86.3% and 82.4%, which are 8.1% and 6.7% higher than the original method, respectively. The advancement of each module is verified by ablation experiments. The experimental results show that the method proposed in this study can effectively realize complex text detection and provide a powerful technical means for historical manuscript reproduction.<\/jats:p>","DOI":"10.3390\/info15080483","type":"journal-article","created":{"date-parts":[[2024,8,14]],"date-time":"2024-08-14T06:23:05Z","timestamp":1723616585000},"page":"483","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["A Historical Handwritten French Manuscripts Text Detection Method in Full Pages"],"prefix":"10.3390","volume":"15","author":[{"given":"Rui","family":"Sang","sequence":"first","affiliation":[{"name":"School of Foreign Languages, North China Electric Power University, Beijing 102206, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shili","family":"Zhao","sequence":"additional","affiliation":[{"name":"College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yan","family":"Meng","sequence":"additional","affiliation":[{"name":"College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mingxian","family":"Zhang","sequence":"additional","affiliation":[{"name":"College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xuefei","family":"Li","sequence":"additional","affiliation":[{"name":"College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Huijie","family":"Xia","sequence":"additional","affiliation":[{"name":"School of Foreign Languages, North China Electric Power University, Beijing 102206, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ran","family":"Zhao","sequence":"additional","affiliation":[{"name":"College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2024,8,14]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Brisinello, M., Grbi\u0107, R., Stefanovi\u010d, D., and Pe\u010dkai-Kova\u010d, R. (2018, January 2\u20135). Optical Character Recognition on images with colorful background. Proceedings of the 2018 IEEE 8th International Conference on Consumer Electronics-Berlin (ICCE-Berlin), Berlin, Germany.","DOI":"10.1109\/ICCE-Berlin.2018.8576202"},{"key":"ref_2","first-page":"IJERTCONV8IS13029","article-title":"Text Recognition from Images: A Study","volume":"8","author":"Adyanthaya","year":"2020","journal-title":"Int. J. Eng. Res."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster r-cnn: Towards real-time object detection with region proposal networks","volume":"39","author":"Ren","year":"2016","journal-title":"IEEE T Pattern Anal."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C., and Berg, A.C. (2016, January 11\u201314). Ssd: Single shot multibox detector. Proceedings of the Computer Vision\u2013ECCV 2016: 14th European Conference, Amsterdam, The Netherlands. Proceedings, Part I 14.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., and Lu, S. (2015, January 23\u201326). ICDAR 2015 competition on robust reading. Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia.","DOI":"10.1109\/ICDAR.2015.7333942"},{"key":"ref_7","unstructured":"Yao, C., Bai, X., Sang, N., Zhou, X., Zhou, S., and Cao, Z. (2016). Scene text detection via holistic, multi-channel prediction. arXiv."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Epshtein, B., Ofek, E., and Wexler, Y. (2010, January 13\u201318). Detecting text in natural scenes with stroke width transform. Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA.","DOI":"10.1109\/CVPR.2010.5540041"},{"key":"ref_10","unstructured":"Neumann, L., and Matas, J. (2010, January 8\u201312). A method for text localization and recognition in real-world images. Proceedings of the Computer Vision\u2013ACCV 2010: 10th Asian Conference on Computer Vision, Queenstown, New Zealand. Revised Selected Papers, Part III 10."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Tian, Z., Huang, W., He, T., He, P., and Qiao, Y. (2016, January 11\u201314). Detecting text in natural image with connectionist text proposal network. Proceedings of the Computer Vision\u2013ECCV 2016: 14th European Conference, Amsterdam, The Netherlands. Proceedings, Part VIII 14.","DOI":"10.1007\/978-3-319-46484-8_4"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., and Liang, J. (2017, January 21\u201326). East: An efficient and accurate scene text detector. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.283"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Wang, W., Xie, E., Li, X., Hou, W., Lu, T., Yu, G., and Shao, S. (2019, January 16\u201320). Shape robust text detection with progressive scale expansion network. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00956"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Wang, X., Jiang, Y., Luo, Z., Liu, C., Choi, H., and Kim, S. (2019, January 16\u201320). Arbitrary shape scene text detection with adaptive text region representation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00661"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., and Zhang, W. (2021, January 19\u201325). Fourier contour embedding for arbitrary-shaped text detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00314"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"He, M., Liao, M., Yang, Z., Zhong, H., Tang, J., Cheng, W., Yao, C., Wang, Y., and Bai, X. (2021, January 19\u201325). MOST: A multi-oriented scene text detector with localization refinement. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00870"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., and Tao, D. (2023, January 7\u201314). Dptext-detr: Towards better scene text detection with dynamic points in transformer. Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA.","DOI":"10.1609\/aaai.v37i3.25430"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Carbonell, M., Mas, J., Villegas, M., Forn\u00e9s, A., and Llad\u00f3s, J. (2019, January 22\u201325). End-to-end handwritten text detection and transcription in full pages. Proceedings of the 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), Sydney, NSW, Australia.","DOI":"10.1109\/ICDARW.2019.40077"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"230","DOI":"10.1016\/j.gltp.2022.03.028","article-title":"An improved method for text detection using Adam optimization algorithm","volume":"3","author":"Kohli","year":"2022","journal-title":"Glob. Transit. Proc."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"100182","DOI":"10.1016\/j.bdr.2020.100182","article-title":"Digitnet: A deep handwritten digit detection and recognition method using a new historical handwritten digit dataset","volume":"23","author":"Kusetogullari","year":"2021","journal-title":"Big Data Res."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Wang, X., Xie, L., Dong, C., and Shan, Y. (2021, January 11\u201317). Real-esrgan: Training real-world blind super-resolution with pure synthetic data. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCVW54120.2021.00217"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Zhang, K., Liang, J., Van Gool, L., and Timofte, R. (2021, January 11\u201317). Designing a practical degradation model for deep blind image super-resolution. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00475"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Chen, Z., Zhang, Y., Gu, J., Kong, L., and Yang, X. (2023). Recursive generalization transformer for image super-resolution. arXiv.","DOI":"10.1109\/ICCV51070.2023.01131"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Chen, X., Wang, X., Zhou, J., Qiao, Y., and Dong, C. (2023, January 18\u201322). Activating more pixels in image super-resolution transformer. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver BC, Canada.","DOI":"10.1109\/CVPR52729.2023.02142"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 11\u201317). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Liu, W., Lu, H., Fu, H., and Cao, Z. (2023, January 2\u20136). Learning to upsample by learning to sample. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Paris, France.","DOI":"10.1109\/ICCV51070.2023.00554"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Li, Y., Hou, Q., Zheng, Z., Cheng, M., Yang, J., and Li, X. (2023, January 2\u20136). Large selective kernel network for remote sensing object detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Paris, France.","DOI":"10.1109\/ICCV51070.2023.01540"},{"key":"ref_28","unstructured":"Yang, X., Yan, J., Ming, Q., Wang, W., Zhang, X., and Tian, Q. (2021, January 18\u201324). Rethinking rotated object detection with gaussian wasserstein distance loss. Proceedings of the International Conference on Machine Learning, Virtual."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Liao, M., Wan, Z., Yao, C., Chen, K., and Bai, X. (2020, January 7\u201312). Real-time scene text detection with differentiable binarization. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i07.6812"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Baek, Y., Lee, B., Han, D., Yun, S., and Lee, H. (2019, January 16\u201320). Character region awareness for text detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00959"}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/15\/8\/483\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T15:36:27Z","timestamp":1760110587000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/15\/8\/483"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8,14]]},"references-count":30,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2024,8]]}},"alternative-id":["info15080483"],"URL":"https:\/\/doi.org\/10.3390\/info15080483","relation":{},"ISSN":["2078-2489"],"issn-type":[{"type":"electronic","value":"2078-2489"}],"subject":[],"published":{"date-parts":[[2024,8,14]]}}}