{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,3]],"date-time":"2026-07-03T04:44:49Z","timestamp":1783053889179,"version":"3.54.6"},"reference-count":56,"publisher":"Springer Science and Business Media LLC","issue":"10","license":[{"start":{"date-parts":[[2025,7,12]],"date-time":"2025-07-12T00:00:00Z","timestamp":1752278400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,7,12]],"date-time":"2025-07-12T00:00:00Z","timestamp":1752278400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["416910787"],"award-info":[{"award-number":["416910787"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Comput Vis"],"published-print":{"date-parts":[[2025,10]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>The imitation of cursive handwriting is mainly limited to generating handwritten words or lines. Multiple synthetic outputs must be stitched together to create paragraphs or whole pages, whereby consistency and layout information are lost. To close this gap, we propose a method for imitating handwriting at the paragraph level that also works for unseen writing styles. Therefore, we introduce a modified latent diffusion model that enriches the encoder-decoder mechanism with specialized loss functions that explicitly preserve the style and content. We enhance the attention mechanism of the diffusion model with adaptive 2D positional encoding and the conditioning mechanism to work with two modalities simultaneously: a style image and the target text. This significantly improves the realism of the generated handwriting. We set a new benchmark in our comprehensive evaluation, achieving 61\u00a0% mAP and 56\u00a0% top-1 accuracy in style preservation, significantly outperforming the previous best method (37\u00a0% mAP, 30\u00a0% top-1). We are making our code publicly available for reproducibility, supporting research in this area and research into potential countermeasures: <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"https:\/\/github.com\/M4rt1nM4yr\/paragraph_handwriting_imitation_ldm\" ext-link-type=\"uri\">https:\/\/github.com\/M4rt1nM4yr\/paragraph_handwriting_imitation_ldm<\/jats:ext-link>\n          <\/jats:p>","DOI":"10.1007\/s11263-025-02525-0","type":"journal-article","created":{"date-parts":[[2025,7,12]],"date-time":"2025-07-12T06:15:27Z","timestamp":1752300927000},"page":"7054-7075","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Zero-Shot Paragraph-level Handwriting Imitation with Latent Diffusion Models"],"prefix":"10.1007","volume":"133","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3706-285X","authenticated-orcid":false,"given":"Martin","family":"Mayr","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-1929-1695","authenticated-orcid":false,"given":"Marcel","family":"Dreier","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1240-5809","authenticated-orcid":false,"given":"Florian","family":"Kordon","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9153-1031","authenticated-orcid":false,"given":"Mathias","family":"Seuret","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3889-6629","authenticated-orcid":false,"given":"Jochen","family":"Z\u00f6llner","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4196-0289","authenticated-orcid":false,"given":"Fei","family":"Wu","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9550-5284","authenticated-orcid":false,"given":"Andreas","family":"Maier","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0455-3799","authenticated-orcid":false,"given":"Vincent","family":"Christlein","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2025,7,12]]},"reference":[{"key":"2525_CR1","doi-asserted-by":"crossref","unstructured":"Aksan, E., Pece, F., & Hilliges, O. (2018). Deepwriting: Making digital ink editable via deep generative modeling. Conference on human factors in computing systems (p. 1-14). New York, NY, USA: Association for Computing Machinery.","DOI":"10.1145\/3173574.3173779"},{"key":"2525_CR2","doi-asserted-by":"crossref","unstructured":"Arandjelovi\u0107, R., & Zisserman, A. (2012). Three things everyone should know to improve object retrieval. Ieee\/cvf conference on computer vision and pattern recognition (cvpr) (pp. 2911-2918). Providence.","DOI":"10.1109\/CVPR.2012.6248018"},{"key":"2525_CR3","doi-asserted-by":"crossref","unstructured":"Bhunia, A.K., Khan, S., Cholakkal, H., Anwer, R.M., Khan, F.S., & Shah, M. (2021). Handwriting transformers. Ieee\/cvf international conference on computer vision (iccv) (pp. 1086-1094).","DOI":"10.1109\/ICCV48922.2021.00112"},{"key":"2525_CR4","doi-asserted-by":"publisher","first-page":"1662","DOI":"10.3389\/fnhum.2016.00488","volume":"10","author":"A Bisio","year":"2016","unstructured":"Bisio, A., Pedull\u00e0, L., Bonzano, L., Ruggeri, P., Brichetto, G., & Bove, M. (2016). Eval Uation of Handwriting Movement Kinematics: From an Ecological to a Magnetic Resonance Environment. Front. Hum. Neurosci., 10, 1662\u20135161. https:\/\/doi.org\/10.3389\/fnhum.2016.00488","journal-title":"Front. Hum. Neurosci."},{"key":"2525_CR5","unstructured":"Bi\u0144kowski, M., Sutherland, D.J., Arbel, M., & Gretton, A. (2018). Demystifying MMD GANs. International conference on learning representations (iclr). Retrieved from https:\/\/openreview.net\/forum?id=r1lUOzWCW"},{"key":"2525_CR6","doi-asserted-by":"crossref","unstructured":"Carri\u00e9re, G., Nikolaidou, K., Kordon, F., Mayr, M., Seuret, M., & Christlein, V. (2023). Beyond human forgeries: An investigation into detecting diffusion-generated handwriting. M. Coustaty & A. Forn\u00e9s (Eds.), International conference on doc ument analysis and recognition (icdar) workshops (pp. 5-19). Cham: Springer Nature Switzerland.","DOI":"10.1007\/978-3-031-41498-5_1"},{"key":"2525_CR7","unstructured":"Chang, J.-H.R., Shrivastava, A., Koppula, H., Zhang, X., & Tuzel, O. (2022). Style equalization: Unsupervised learning of controllable generative sequence models. K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, & S. Sabato (Eds.), International conference on machine learning (icml) (Vol. 162, pp. 2917-2937). PMLR. Retrieved from https:\/\/proceedings.mlr.press\/v162\/chang22a.html"},{"key":"2525_CR8","unstructured":"Chen, J., Huang, Y., Lv, T., Cui, L., Chen, Q., & Wei, F. (2023). Textdiffuser: Diffusion models as text painters. A. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, & S. Levine (Eds.), Advances in neural information processing systems (neurips) (Vol. 36, pp. 9353-9387). Curran Associates, Inc."},{"key":"2525_CR9","doi-asserted-by":"crossref","unstructured":"Christlein, V., Bernecker, D., & Angelopoulou, E. (2015). Writer identification using VLAD encoded contour-Zernike moments. International conference on document analysis and recognition (icdar) (pp. 906-910). Nancy.","DOI":"10.1109\/ICDAR.2015.7333893"},{"key":"2525_CR10","doi-asserted-by":"publisher","first-page":"258","DOI":"10.1016\/j.patcog.2016.10.005","volume":"63","author":"V Christlein","year":"2017","unstructured":"Christlein, V., Bernecker, D., H\u00f6nig, F., Maier, A., & Angelopoulou, E. (2017). Writer Identification Using GMM Supervectors and Exemplar-SVMs. Pattern Recognit., 63, 258\u2013267. https:\/\/doi.org\/10.1016\/j.patcog.2016.10.005","journal-title":"Pattern Recognit."},{"key":"2525_CR11","doi-asserted-by":"crossref","unstructured":"Christlein, V., & Maier, A. (2018). Encoding CNN activations for writer recognition. Iapr international workshop on document analysis systems (pp. 169-174). Vienna.","DOI":"10.1109\/DAS.2018.9"},{"key":"2525_CR12","doi-asserted-by":"crossref","unstructured":"Dai, G., Zhang, Y., Wang, Q., Du, Q., Yu, Z., Liu, Z., & Huang, S. (2023). Disentan gling writer and character styles for handwriting generation. Ieee\/cvf conference on computer vision and pattern recognition (cvpr) (pp. 5977-5986).","DOI":"10.1109\/CVPR52729.2023.00579"},{"key":"2525_CR13","doi-asserted-by":"crossref","unstructured":"Davis, B., Tensmeyer, C., Price, B., Wigington, C., Morse, B., & Jain, R. (2020). Text and style conditioned GAN for generation of offline handwriting lines. British machine vision conference (bmvc). Retrieved from https:\/\/www.bmvc2020-conference.com\/assets\/papers\/0815.pdf","DOI":"10.5244\/C.34.174"},{"key":"2525_CR14","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. Ieee\/cvf conference on computer vision and pattern recognition (cvpr) (pp. 248-255).","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"2525_CR15","unstructured":"Dhariwal, P., & Nichol, A. (2021). Diffusion models beat GANs on image synthesis. M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, & J.W. Vaughan (Eds.), Advances in neural information processing systems (neurips) (Vol. 34, pp. 8780-8794). Curran Associates, Inc."},{"key":"2525_CR16","doi-asserted-by":"crossref","unstructured":"Ding, H., Luan, B., Gui, D., Chen, K., & Huo, Q. (2023). Improving handwritten ocr with training samples generated by glyph conditional denoising diffusion probabilistic model. Retrieved from https:\/\/arxiv.org\/abs\/2305.19543","DOI":"10.1007\/978-3-031-41685-9_2"},{"key":"2525_CR17","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T.,Dehghani,M.,Minderer,M., Heigold,G., Gelly,S.,Uszkoreit,J., & Houlsby, N. (2021). An image is worth 16$$\\times $$16 words: Transformers for image recognition at scale. International conference on learning representations (iclr). Retrieved from https:\/\/openreview.net\/forum?id=YicbFdNTTy"},{"key":"2525_CR18","doi-asserted-by":"crossref","unstructured":"Fogel, S., Averbuch-Elor, H., Cohen, S., Mazor, S., & Litman, R. (2020). Scrab blegan: Semi-supervised varying length handwritten text generation. Ieee\/cvf conference on computer vision and pattern recognition (cvpr) (p. 4323-4332).","DOI":"10.1109\/CVPR42600.2020.00438"},{"key":"2525_CR19","doi-asserted-by":"publisher","unstructured":"Gan, J., Wang, W., Leng, J., & Gao, X. (2022). HiGAN+: Handwriting imitation GAN with disentangled representations. ACM Trans. Graph., 42 (1),1\u201317 Retrieved from https:\/\/doi.org\/10.1145\/3550070","DOI":"10.1145\/3550070"},{"key":"2525_CR20","unstructured":"Graves, A. (2014). Generating sequences with recurrent neural networks. Retrieved from https:\/\/arxiv.org\/abs\/1308.0850"},{"key":"2525_CR21","unstructured":"Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., & Hochreiter, S. (2017). Gans trained by a two time-scale update rule converge to a local Nash equilib rium. I. Guyon et al. (Eds.), Advances in neural information processing systems (neurips)  (Vol. 30). Curran Associates, Inc."},{"key":"2525_CR22","unstructured":"Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, & H. Lin (Eds.), Advances in neural information processing systems (neurips) (Vol. 33, pp. 6840-6851). Curran Associates, Inc."},{"key":"2525_CR23","unstructured":"Ho, J., & Salimans, T. (2021). Classifier-free diffusion guidance. Advances in neural information processing systems (neurips) workshop on deep generative models and downstream applications. Retrieved from https:\/\/openreview.net\/forum?id=qw8AKxfYbI"},{"key":"2525_CR24","doi-asserted-by":"publisher","unstructured":"Huang, H., Yang, D., Dai, G., Han, Z., Wang, Y., Lam, K.-M., Yang, F.,Huang, S.,Liu, Y., & He, M. (2022). Agtgan: Unpaired image translation for photographic ancient charac ter generation. Acm international conference on multimedia (p. 5456-5467). New York, NY, USA: Association for Computing Machinery. Retrieved from https:\/\/doi.org\/10.1145\/3503161.3548338","DOI":"10.1145\/3503161.3548338"},{"key":"2525_CR25","doi-asserted-by":"crossref","unstructured":"Huang, X., & Belongie, S. (2017). Arbitrary style transfer in real-time with adap tive instance normalization. Ieee international conference on computer vision (iccv).","DOI":"10.1109\/ICCV.2017.167"},{"key":"2525_CR26","doi-asserted-by":"publisher","unstructured":"Kang, L., Riba, P., Rusinol, M., Fornes, A., & Villegas, M. (2021). Content and style aware generation of text-line images for handwriting recognition. IEEE PAMI,1\u20131,. https:\/\/doi.org\/10.1109\/TPAMI.2021.3122572","DOI":"10.1109\/TPAMI.2021.3122572"},{"key":"2525_CR27","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2022.108766","volume":"129","author":"L Kang","year":"2022","unstructured":"Kang, L., Riba, P., Rusi\u00f1ol, M., Forn\u00e9s, A., & Villegas, M. (2022). Pay Attention to What You Read: Non-recurrent Handwritten Text-line Recognition. Pattern Recognit., 129, Article 108766. https:\/\/doi.org\/10.1016\/j.patcog.2022.108766","journal-title":"Pattern Recognit."},{"key":"2525_CR28","doi-asserted-by":"crossref","unstructured":"Kang, L., Riba, P., Wang, Y., Rusi\u00f1ol, M., Forn\u00e9s, A., Villegas, M. (2020). Ganwriting: Content-conditioned generation of styled handwritten word images. A. Vedaldi, H. Bischof, T. Brox, & J.-M. Frahm (Eds.), European conference on computer vision (eccv) (pp. 273-289). Cham: Springer International Publishing.","DOI":"10.1007\/978-3-030-58592-1_17"},{"key":"2525_CR29","unstructured":"Kingma, D.P., & Welling, M. (2014). Auto-encoding variational bayes. International conference on learning representations (iclr)."},{"key":"2525_CR30","doi-asserted-by":"crossref","unstructured":"Kleber, F., Fiel, S., Diem, M., & Sablatnig, R. (2013). CVL-DataBase: An Off-LineDatabase for Writer Retrieval. Icdar: Writer Identification and Word Spotting.","DOI":"10.1109\/ICDAR.2013.117"},{"key":"2525_CR31","doi-asserted-by":"crossref","unstructured":"Kodym, O., & Hradi\u0161, M. (2021). Page layout analysis system for unconstrained historic documents. J. Llad\u00f3s, D. Lopresti, & S. Uchida (Eds.), Document anal ysis and recognition-icdar 2021 (pp. 492-506). Cham: Springer International Publishing.","DOI":"10.1007\/978-3-030-86331-9_32"},{"key":"2525_CR32","doi-asserted-by":"crossref","unstructured":"Lee, J., Park, S., Baek, J., Oh, S.J., Kim, S., & Lee, H. (2020). On recognizing texts of arbitrary shapes with 2D self-attention. Ieee\/cvf conference on computer vision and pattern recognition (cvpr) workshops.","DOI":"10.1109\/CVPRW50498.2020.00281"},{"key":"2525_CR33","doi-asserted-by":"publisher","unstructured":"Lowe, D.G. (2004). Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis., 60 (2), 91-110, https:\/\/doi.org\/10.1023\/B:VISI.0000029664.99615.94","DOI":"10.1023\/B:VISI.0000029664.99615.94"},{"key":"2525_CR34","unstructured":"Luhman, T., & Luhman, E. (2020). Diffusion models for handwriting generation"},{"issue":"11","key":"2525_CR35","doi-asserted-by":"publisher","first-page":"8503","DOI":"10.1109\/TNNLS.2022.3151477","volume":"34","author":"C Luo","year":"2023","unstructured":"Luo, C., Zhu, Y., Jin, L., Li, Z., & Peng, D. (2023). Slogan: Handwriting Style Syn Thesis for Arbitrary-length and Out-of-vocabulary Text. IEEE Trans. Neural Netw. Learn. Syst., 34(11), 8503\u20138515. https:\/\/doi.org\/10.1109\/TNNLS.2022.3151477","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"issue":"1","key":"2525_CR36","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1007\/s100320200071","volume":"5","author":"U-V Marti","year":"2002","unstructured":"Marti, U.-V., & Bunke, H. (2002). The Iam-database: an English Sentence Database for Offline Handwriting Recognition. Int. J. Doc. Anal. Recog., 5(1), 39\u201346. https:\/\/doi.org\/10.1007\/s100320200071","journal-title":"Int. J. Doc. Anal. Recog."},{"key":"2525_CR37","doi-asserted-by":"crossref","unstructured":"Mattick, A., Mayr, M., Seuret, M., Maier, A., & Christlein, V. (2021). Smartpatch: Improving handwritten word imitation with patch discriminators. J. Llad\u00f3s, D. Lopresti, & S. Uchida (Eds.), Document analysis and recognition (icdar) (pp. 268-283). Cham: Springer International Publishing.","DOI":"10.1007\/978-3-030-86549-8_18"},{"key":"2525_CR38","doi-asserted-by":"crossref","unstructured":"Mayr, M., Stumpf, M., Nicolaou, A., Seuret, M., Maier, A., & Christlein, V. (2020). Spatio-temporal handwriting imitation. A. Bartoli & A. Fusiello (Eds.), Euro pean conference on computer vision (eccv) workshops (pp. 528-543). Cham: Springer International Publishing.","DOI":"10.1007\/978-3-030-68238-5_38"},{"key":"2525_CR39","unstructured":"McInnes, L., Healy, J., & Melville, J. (2020). Umap: Uniform manifold approximation and projection for dimension reduction."},{"key":"2525_CR40","unstructured":"Nichol, A.Q., & Dhariwal, P. (2021). Improved denoising diffusion proba bilistic models. M. Meila & T. Zhang (Eds.), International conference on machine learning (icml)  (Vol. 139, pp. 8162-8171). PMLR. Retrieved from https:\/\/proceedings.mlr.press\/v139\/nichol21a.html"},{"key":"2525_CR41","doi-asserted-by":"crossref","unstructured":"Nikolaidou, K., Retsinas, G., Christlein, V., Seuret, M., Sfikas, G., Smith, E.B., Mokayed, H., & Liwicki, M. (2023). Wordstylist: Styled verbatim handwritten text generation with latent diffusion models. G.A. Fink, R. Jain, K. Kise, & R. Zanibbi (Eds.), Document analysis and recognition (icdar) (pp. 384-401). Cham: Springer Nature Switzerland.","DOI":"10.1007\/978-3-031-41679-8_22"},{"key":"2525_CR42","doi-asserted-by":"crossref","unstructured":"Pippi, V., Cascianelli, S., & Cucchiara, R. (2023). Handwritten text generation from visual archetypes. Ieee\/cvf conference on computer vision and pattern recognition (cvpr) (pp. 22458-22467).","DOI":"10.1109\/CVPR52729.2023.02151"},{"key":"2525_CR43","doi-asserted-by":"crossref","unstructured":"Pippi, V., Quattrini, F., Cascianelli, S., & Cucchiara, R. (2023). Hwd: A novel evaluation score for styled handwritten text generation. British machine vision conference (bmvc). BMVA. Retrieved from https:\/\/papers.bmvc2023.org\/0007.pdf","DOI":"10.1109\/CVPR52729.2023.02151"},{"key":"2525_CR44","unstructured":"Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A.,Chen, M., & Sutskever, I. (2021). Zero-shot text-to-image generation. M. Meila & T. Zhang (Eds.), International conference on machine learning (icml) (Vol. 139, pp. 8821-8831). PMLR. Retrieved from https:\/\/proceedings.mlr.press\/v139\/ramesh21a.html"},{"key":"2525_CR45","unstructured":"Rezende, D.J., Mohamed, S., & Wierstra, D. (2014). Stochastic backprop agation and approximate inference in deep generative models. E.P. Xing & T. Jebara (Eds.), International conference on machine learning (icml) (Vol. 32, pp. 1278-1286). Bejing, China: PMLR. Retrieved from https:\/\/proceedings.mlr.press\/v32\/rezende14.html"},{"key":"2525_CR46","doi-asserted-by":"crossref","unstructured":"Riaz, N., Saifullah, S., Agne, S., Dengel, A., & Ahmed, S. (2024). Stylusai: Stylistic adaptation for robust german handwritten text generation. E.H. Barney Smith, M. Liwicki, & L. Peng (Eds.), Document analysis and recognition-icdar 2024 (pp. 429-444). Cham: Springer Nature Switzerland.","DOI":"10.1007\/978-3-031-70536-6_26"},{"key":"2525_CR47","doi-asserted-by":"crossref","unstructured":"Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. Ieee\/cvf conference on computer vision and pattern recognition (cvpr) (p. 10674-10685).","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"2525_CR48","unstructured":"Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S. (2015). Deep unsupervised learning using nonequilibrium thermodynamics. F. Bach & D. Blei (Eds.), International conference on machine learning (icml) (Vol. 37, pp. 2256-2265). Lille, France: PMLR. Retrieved from https:\/\/proceedings.mlr.press\/v37\/sohl-dickstein15.html"},{"key":"2525_CR49","unstructured":"Song, J., Meng, C., & Ermon, S. (2021). Denoising diffusion implicit models. International conference on learning representations (iclr). Retrieved from https:\/\/openreview.net\/forum?id=St1giarCHLP"},{"key":"2525_CR50","doi-asserted-by":"crossref","unstructured":"Tang, L., Cai, Y., Liu, J., Hong, Z., Gong, M., Fan, M.,Han, J.,Liu, J., Ding, E., & Wang, J. (2022). Few shot font generation by learning fine-grained local styles. Ieee\/cvf conference on computer vision and pattern recognition (cvpr) (pp. 7895-7904).","DOI":"10.1109\/CVPR52688.2022.00774"},{"key":"2525_CR51","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., & Polo sukhin, I. (2017). Attention is all you need. I. Guyon et al. (Eds.), Advances in neural information processing systems (Vol. 30). Curran Associates, Inc."},{"key":"2525_CR52","doi-asserted-by":"crossref","unstructured":"Wick, C., Z\u00f6llner, J., & Gr\u00fcning, T. (2021). Transformer for handwritten text recognition using bidirectional post-decoding. J. Llad\u00f3s, D. Lopresti, & S. Uchida (Eds.), Document analysis and recognition-icdar 2021 (pp. 112-126). Cham: Springer International Publishing.","DOI":"10.1007\/978-3-030-86334-0_8"},{"key":"2525_CR53","doi-asserted-by":"publisher","unstructured":"Yang, L., Zhang, Z., Song, Y., Hong, S., Xu, R., Zhao, Y., Zhang, W.,Cui, B., & Yang, M.-H. (2023). Diffusion models: A comprehensive survey of methods and applications. ACM Comput. Surv., 56 (4),1-39 https:\/\/doi.org\/10.1145\/3626235","DOI":"10.1145\/3626235"},{"key":"2525_CR54","doi-asserted-by":"crossref","unstructured":"Zdenek, J., & Nakayama, H. (2021). Jokergan: Memory-efficient model for handwrit ten text generation with text line awareness. Acm international conference on multimedia (p. 5655-5663). New York, NY, USA: Association for Computing Machinery.","DOI":"10.1145\/3474085.3475713"},{"key":"2525_CR55","doi-asserted-by":"crossref","unstructured":"Zdenek, J., & Nakayama, H. (2023). Handwritten text generation with character specific encoding for style imitation. G.A. Fink, R. Jain, K. Kise, & R. Zanibbi (Eds.), Document analysis and recognition (icdar) (pp. 313-329). Cham: Springer Nature Switzerland.","DOI":"10.1007\/978-3-031-41679-8_18"},{"key":"2525_CR56","doi-asserted-by":"crossref","unstructured":"Zhu, Y., Li, Z., Wang, T., He, M., & Yao, C. (2023). Conditional text image generation with diffusion models. Ieee\/cvf conference on computer vision and pattern recognition (cvpr) (pp. 14235-14245).","DOI":"10.1109\/CVPR52729.2023.01368"}],"container-title":["International Journal of Computer Vision"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11263-025-02525-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11263-025-02525-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11263-025-02525-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T08:49:22Z","timestamp":1760086162000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11263-025-02525-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,12]]},"references-count":56,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2025,10]]}},"alternative-id":["2525"],"URL":"https:\/\/doi.org\/10.1007\/s11263-025-02525-0","relation":{},"ISSN":["0920-5691","1573-1405"],"issn-type":[{"value":"0920-5691","type":"print"},{"value":"1573-1405","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,7,12]]},"assertion":[{"value":"2 September 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 July 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 July 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflicts of Interest"}},{"value":"The code can be accessed via GitHub link.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Code availability"}}]}}