{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,20]],"date-time":"2026-01-20T03:36:36Z","timestamp":1768880196188,"version":"3.49.0"},"reference-count":16,"publisher":"Fuji Technology Press Ltd.","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["JACIII","J. Adv. Comput. Intell. Intell. Inform."],"published-print":{"date-parts":[[2026,1,20]]},"abstract":"<jats:p>Improving the accuracy of handwritten character string recognition allows handwritten documents to be converted into digital text. This facilitates camera-based text input, enabling robotic process automation to manage documentation tasks. Although this field has seen significant progress, recognizing handwritten Japanese remains particularly challenging due to the difficulty of character segmentation, the wide variety of character types, and the absence of clear word boundaries. These factors make unconstrained handwritten Japanese string recognition particularly difficult for conventional approaches. Moreover, transformer-based models typically require large amounts of annotated training data. This study proposes and investigates a new String Recognition Transformer (SRT) model capable of recognizing unconstrained handwritten Japanese character strings without relying on explicit character segmentation or a large number of training images. The SRT model integrates a convolutional neural network backbone for robust local feature extraction, a Transformer encoder-decoder architecture, and a sliding window strategy that generates overlapping patches. Comparative experiments show that our method achieved a character error rate (CER) of 0.067, significantly outperforming convolutional recurrent neural network, transformer-based optical character recognition, and handwritten text recognition with Vision Transformer which achieved CERs of 0.664, 0.165, and 0.106, respectively, thereby confirming the effectiveness and robustness of the approach.<\/jats:p>","DOI":"10.20965\/jaciii.2026.p0015","type":"journal-article","created":{"date-parts":[[2026,1,19]],"date-time":"2026-01-19T15:02:06Z","timestamp":1768834926000},"page":"15-23","source":"Crossref","is-referenced-by-count":0,"title":["Handwritten Character String Recognition Using a String Recognition Transformer"],"prefix":"10.20965","volume":"30","author":[{"given":"Shunya","family":"Rakuka","sequence":"first","affiliation":[{"name":"Graduate School of Engineering, Mie University, 1577 Kurimamachiya-cho, Tsu, Mie 514-8507, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7171-8197","authenticated-orcid":true,"given":"Kento","family":"Morita","sequence":"additional","affiliation":[{"name":"Graduate School of Engineering, Mie University, 1577 Kurimamachiya-cho, Tsu, Mie 514-8507, Japan"}]},{"given":"Tetsushi","family":"Wakabayashi","sequence":"additional","affiliation":[{"name":"Graduate School of Engineering, Mie University, 1577 Kurimamachiya-cho, Tsu, Mie 514-8507, Japan"}]}],"member":"8550","published-online":{"date-parts":[[2026,1,20]]},"reference":[{"key":"key-10.20965\/jaciii.2026.p0015-1","doi-asserted-by":"crossref","unstructured":"S. Rakuka, K. Morita, and T. Wakabayashi, \u201cHandwritten character string recognition using transformer and CNN features,\u201d Proc. of 2024 Joint 13th Int. Conf. on Soft Computing and Intelligent Systems and 25th Int. Symp. on Advanced Intelligent Systems (SCIS&ISIS), 2024. https:\/\/doi.org\/10.1109\/SCISISIS61014.2024.10759989","DOI":"10.1109\/SCISISIS61014.2024.10759989"},{"key":"key-10.20965\/jaciii.2026.p0015-2","unstructured":"C. Bartz, H. Yang, and C. Meinel, \u201cSTN-OCR: A single neural network for text detection and text recognition,\u201d arXiv:1707.08831, 2017. https:\/\/doi.org\/10.48550\/arXiv.1707.08831"},{"key":"key-10.20965\/jaciii.2026.p0015-3","doi-asserted-by":"crossref","unstructured":"B. Shi, M. Yang, X. Wang, P. Lyu, C. Yao, and X. Bai, \u201cAster: An attentional scene text recognizer with flexible rectification,\u201d IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.41, No.9, pp. 2035-2048, 2018. https:\/\/doi.org\/10.1109\/TPAMI.2018.2848939","DOI":"10.1109\/TPAMI.2018.2848939"},{"key":"key-10.20965\/jaciii.2026.p0015-4","doi-asserted-by":"crossref","unstructured":"B. Shi, X. Bai, and C. Yao, \u201cAn end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition,\u201d IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.39, No.11, pp. 2298-2304, 2016. https:\/\/doi.org\/10.1109\/TPAMI.2016.2646371","DOI":"10.1109\/TPAMI.2016.2646371"},{"key":"key-10.20965\/jaciii.2026.p0015-5","doi-asserted-by":"crossref","unstructured":"B. Shi, X. Wang, P. Lyu, C. Yao, and X. Bai, \u201cRobust scene text recognition with automatic rectification,\u201d Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 4168-4176, 2016. https:\/\/doi.org\/10.1109\/CVPR.2016.452","DOI":"10.1109\/CVPR.2016.452"},{"key":"key-10.20965\/jaciii.2026.p0015-6","doi-asserted-by":"crossref","unstructured":"C. Luo, L. Jin, and Z. Sun, \u201cMoran: A multi-object rectified attention network for scene text recognition,\u201d Pattern Recognition, Vol.90, pp. 109-118, 2019. https:\/\/doi.org\/10.1016\/j.patcog.2019.01.020","DOI":"10.1016\/j.patcog.2019.01.020"},{"key":"key-10.20965\/jaciii.2026.p0015-7","unstructured":"A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, \u0141. Kaiser, and I. Polosukhin, \u201cAttention is all you need,\u201d Advances in Neural Information Processing Systems (NIPS 2017), Vol.30, 2017."},{"key":"key-10.20965\/jaciii.2026.p0015-8","unstructured":"A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., \u201cAn image is worth 16\n                      x\n                      16 words: Transformers for image recognition at scale,\u201d arXiv:2010.11929, 2020. https:\/\/doi.org\/10.48550\/arXiv.2010.11929"},{"key":"key-10.20965\/jaciii.2026.p0015-9","doi-asserted-by":"crossref","unstructured":"M. Li, T. Lv, J. Chen, L. Cui, Y. Lu, D. Florencio, C. Zhang, Z. Li, and F. Wei, \u201cTrocr: Transformer-based optical character recognition with pre-trained models,\u201d Proc. of the AAAI Conf. on Artificial Intelligence, Vol.37, No.11, pp. 13094-13102, 2023. https:\/\/doi.org\/10.1609\/aaai.v37i11.26538","DOI":"10.1609\/aaai.v37i11.26538"},{"key":"key-10.20965\/jaciii.2026.p0015-10","doi-asserted-by":"crossref","unstructured":"M. Fujitake, \u201cDtrocr: Decoder-only transformer for optical character recognition,\u201d Proc. of the IEEE\/CVF Winter Conf. on Applications of Computer Vision, pp. 8025-8035, 2024. https:\/\/doi.org\/10.1109\/WACV57701.2024.00784","DOI":"10.1109\/WACV57701.2024.00784"},{"key":"key-10.20965\/jaciii.2026.p0015-11","doi-asserted-by":"crossref","unstructured":"Y. Li, D. Chen, T. Tang, and X. Shen, \u201cHtr-vt: Handwritten text recognition with vision transformer,\u201d Pattern Recognition, Vol.158, Article No.110967, 2025. https:\/\/doi.org\/10.1016\/j.patcog.2024.110967","DOI":"10.1016\/j.patcog.2024.110967"},{"key":"key-10.20965\/jaciii.2026.p0015-12","doi-asserted-by":"crossref","unstructured":"G. Kim, T. Hong, M. Yim, J. Nam, J. Park, J. Yim, W. Hwang, S. Yun, D. Han, and S. Park, \u201cOcr-free document understanding transformer,\u201d European Conf. on Computer Vision, pp. 498-517, 2022. https:\/\/doi.org\/10.1007\/978-3-031-19815-1_29","DOI":"10.1007\/978-3-031-19815-1_29"},{"key":"key-10.20965\/jaciii.2026.p0015-13","doi-asserted-by":"crossref","unstructured":"F. Sheng, Z. Chen, and B. Xu, \u201cNrtr: A no-recurrence sequence-to-sequence model for scene text recognition,\u201d 2019 Int. Conf. on Document Analysis and Recognition (ICDAR), pp. 781-786, 2019. https:\/\/doi.org\/10.1109\/ICDAR.2019.00130","DOI":"10.1109\/ICDAR.2019.00130"},{"key":"key-10.20965\/jaciii.2026.p0015-14","doi-asserted-by":"crossref","unstructured":"J. Li, Y. Xu, T. Lv, L. Cui, C. Zhang, and F. Wei, \u201cDit: Self-supervised pre-training for document image transformer,\u201d Proc. of the 30th ACM Int. Conf. on Multimedia, pp. 3530-3539, 2022. https:\/\/doi.org\/10.1145\/3503161.3547911","DOI":"10.1145\/3503161.3547911"},{"key":"key-10.20965\/jaciii.2026.p0015-15","unstructured":"I. Sutskever, O. Vinyals, and Q. V. Le, \u201cSequence to sequence learning with neural networks,\u201d Advances in Neural Information Processing Systems, Vol.27, 2014."},{"key":"key-10.20965\/jaciii.2026.p0015-16","doi-asserted-by":"crossref","unstructured":"Y. Baek, B. Lee, D. Han, S. Yun, and H. Lee, \u201cCharacter region awareness for text detection,\u201d Proc. of the IEEE\/CVF Conf. on Computer Vision and Pattern Recognition, pp. 9365-9374, 2019. https:\/\/doi.org\/10.1109\/CVPR.2019.00959","DOI":"10.1109\/CVPR.2019.00959"}],"container-title":["Journal of Advanced Computational Intelligence and Intelligent Informatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.fujipress.jp\/main\/wp-content\/themes\/Fujipress\/hyosetsu.php?ppno=jacii003000010002","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,19]],"date-time":"2026-01-19T15:02:12Z","timestamp":1768834932000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.fujipress.jp\/jaciii\/jc\/jacii003000010015"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1,20]]},"references-count":16,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2026,1,20]]},"published-print":{"date-parts":[[2026,1,20]]}},"URL":"https:\/\/doi.org\/10.20965\/jaciii.2026.p0015","relation":{},"ISSN":["1883-8014","1343-0130"],"issn-type":[{"value":"1883-8014","type":"electronic"},{"value":"1343-0130","type":"print"}],"subject":[],"published":{"date-parts":[[2026,1,20]]}}}