{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,7]],"date-time":"2026-03-07T18:28:42Z","timestamp":1772908122740,"version":"3.50.1"},"reference-count":46,"publisher":"MDPI AG","issue":"21","license":[{"start":{"date-parts":[[2022,10,24]],"date-time":"2022-10-24T00:00:00Z","timestamp":1666569600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001691","name":"JSPS KAKENHI","doi-asserted-by":"publisher","award":["JP22K04034"],"award-info":[{"award-number":["JP22K04034"]}],"id":[{"id":"10.13039\/501100001691","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001691","name":"JSPS KAKENHI","doi-asserted-by":"publisher","award":["2022C-183"],"award-info":[{"award-number":["2022C-183"]}],"id":[{"id":"10.13039\/501100001691","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001691","name":"JSPS KAKENHI","doi-asserted-by":"publisher","award":["2021C-589"],"award-info":[{"award-number":["2021C-589"]}],"id":[{"id":"10.13039\/501100001691","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001691","name":"JSPS KAKENHI","doi-asserted-by":"publisher","award":["2020C-780"],"award-info":[{"award-number":["2020C-780"]}],"id":[{"id":"10.13039\/501100001691","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001691","name":"JSPS KAKENHI","doi-asserted-by":"publisher","award":["2020Q-015"],"award-info":[{"award-number":["2020Q-015"]}],"id":[{"id":"10.13039\/501100001691","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Waseda University","award":["JP22K04034"],"award-info":[{"award-number":["JP22K04034"]}]},{"name":"Waseda University","award":["2022C-183"],"award-info":[{"award-number":["2022C-183"]}]},{"name":"Waseda University","award":["2021C-589"],"award-info":[{"award-number":["2021C-589"]}]},{"name":"Waseda University","award":["2020C-780"],"award-info":[{"award-number":["2020C-780"]}]},{"name":"Waseda University","award":["2020Q-015"],"award-info":[{"award-number":["2020Q-015"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Image super-resolution (ISR) technology aims to enhance resolution and improve image quality. It is widely applied to various real-world applications related to image processing, especially in medical images, while relatively little appliedto anime image production. Furthermore, contemporary ISR tools are often based on convolutional neural networks (CNNs), while few methods attempt to use transformers that perform well in other advanced vision tasks. We propose a so-called anime image super-resolution (AISR) method based on the Swin Transformer in this work. The work was carried out in several stages. First, a shallow feature extraction approach was employed to facilitate the features map of the input image\u2019s low-frequency information, which mainly approximates the distribution of detailed information in a spatial structure (shallow feature). Next, we applied deep feature extraction to extract the image semantic information (deep feature). Finally, the image reconstruction method combines shallow and deep features to upsample the feature size and performs sub-pixel convolution to obtain many feature map channels. The novelty of the proposal is the enhancement of the low-frequency information using a Gaussian filter and the introduction of different window sizes to replace the patch merging operations in the Swin Transformer. A high-quality anime dataset was constructed to curb the effects of the model robustness on the online regime. We trained our model on this dataset and tested the model quality. We implement anime image super-resolution tasks at different magnifications (2\u00d7, 4\u00d7, 8\u00d7). The results were compared numerically and graphically with those delivered by conventional convolutional neural network-based and transformer-based methods. We demonstrate the experiments numerically using standard peak signal-to-noise ratio (PSNR) and structural similarity (SSIM), respectively. The series of experiments and ablation study showcase that our proposal outperforms others.<\/jats:p>","DOI":"10.3390\/s22218126","type":"journal-article","created":{"date-parts":[[2022,10,24]],"date-time":"2022-10-24T10:09:23Z","timestamp":1666606163000},"page":"8126","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["A Transformer-Based Model for Super-Resolution of Anime Image"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9835-3338","authenticated-orcid":false,"given":"Shizhuo","family":"Xu","sequence":"first","affiliation":[{"name":"Graduate School of Information, Production and System, Waseda University, Kitakyushu 808-0135, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9640-4725","authenticated-orcid":false,"given":"Vibekananda","family":"Dutta","sequence":"additional","affiliation":[{"name":"Institute of Micromechanics and Photonics, Faculty of Mechatronics, Warsaw University of Technology, 00-661 Warszawa, Poland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1026-5902","authenticated-orcid":false,"given":"Xin","family":"He","sequence":"additional","affiliation":[{"name":"Graduate School of Information, Production and System, Waseda University, Kitakyushu 808-0135, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7776-2984","authenticated-orcid":false,"given":"Takafumi","family":"Matsumaru","sequence":"additional","affiliation":[{"name":"Graduate School of Information, Production and System, Waseda University, Kitakyushu 808-0135, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,10,24]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"548","DOI":"10.2307\/2659734","article-title":"Japan Pop! Inside the World of Japanese Popular Culture. Edited by Timothy J. Craig. Armonk, N.Y.: M.E. Sharpe Inc., 2000. ix, 360 pp. $64.95","volume":"60","author":"Kelsky","year":"2001","journal-title":"J. Asian Stud."},{"key":"ref_2","unstructured":"Napier, S.J. (2016). Anime from Akira to Howl\u2019s Moving Castle: Experiencing Contemporary Japanese Animation, St. Martin\u2019s Griffin."},{"key":"ref_3","unstructured":"(2022, October 18). Miss Dai. Available online: https:\/\/weibo.com\/u\/7520558714."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1153","DOI":"10.1109\/TASSP.1981.1163711","article-title":"Cubic convolution interpolation for digital image processing","volume":"29","author":"Keys","year":"1981","journal-title":"IEEE Trans. Acoust. Speech Signal Process."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"231","DOI":"10.1016\/1049-9652(91)90045-L","article-title":"Improving resolution by image registration","volume":"53","author":"Irani","year":"1991","journal-title":"CVGIP Graph. Model. Image Process."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1127","DOI":"10.1109\/TPAMI.2010.25","article-title":"Single-image super-resolution using sparse regression and natural image prior","volume":"32","author":"Kim","year":"2010","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"2017","DOI":"10.1109\/TIP.2010.2045707","article-title":"Robust web image\/video super-resolution","volume":"19","author":"Xiong","year":"2010","journal-title":"IEEE Trans. Image Process."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1109\/38.988747","article-title":"Example-based super-resolution","volume":"22","author":"Freeman","year":"2002","journal-title":"IEEE Comput. Graph. Appl."},{"key":"ref_9","unstructured":"Chang, H., Yeung, D.Y., and Xiong, Y. (2004\u20132, January 27). Super-resolution through neighbor embedding. Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, USA."},{"key":"ref_10","unstructured":"Yang, J., Wright, J., Huang, T., and Ma, Y. (2008, January 24\u201326). Image super-resolution as sparse representation of raw image patches. Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"2861","DOI":"10.1109\/TIP.2010.2050625","article-title":"Image super-resolution via sparse representation","volume":"19","author":"Yang","year":"2010","journal-title":"IEEE Trans. Image Process."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Dong, C., Loy, C.C., He, K., and Tang, X. (2014). Learning a deep convolutional network for image super-resolution. Computer Vision\u2014ECCV 2014, Proceedings of the 13th European Conference, Zurich, Switzerland, 6\u201312 September 2014, Springer.","DOI":"10.1007\/978-3-319-10593-2_13"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Kim, J., Lee, J.K., and Lee, K.M. (2016, January 27\u201330). Accurate image super-resolution using very deep convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.182"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Lim, B., Son, S., Kim, H., Nah, S., and Mu Lee, K. (2017, January 21\u201326). Enhanced deep residual networks for single image super-resolution. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA.","DOI":"10.1109\/CVPRW.2017.151"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Kim, J., Lee, J.K., and Lee, K.M. (2016, January 27\u201330). Deeply-recursive convolutional network for image super-resolution. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.181"},{"key":"ref_16","unstructured":"Ying, T., Jian, Y., and Liu, X. (2017, January 21\u201326). Image Super-Resolution via Deep Recursive Residual Network. Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition, Honolulu, HI, USA."},{"key":"ref_17","unstructured":"Yu, J., Fan, Y., Yang, J., Xu, N., Wang, Z., Wang, X., and Huang, T. (2018). Wide activation for efficient and accurate image super-resolution. arXiv."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Shi, W., Caballero, J., Husz\u00e1r, F., Totz, J., Aitken, A.P., Bishop, R., Rueckert, D., and Wang, Z. (2016, January 27\u201330). Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.207"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Ledig, C., Theis, L., Husz\u00e1r, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., and Wang, Z. (2017, January 21\u201326). Photo-realistic single image super-resolution using a generative adversarial network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.19"},{"key":"ref_20","unstructured":"Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative Adversarial Networks. arXiv."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Vasu, S., Thekke Madam, N., and Rajagopalan, A. (2018, January 8\u201314). Analyzing perception-distortion tradeoff using enhanced perceptual super-resolution network. Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany.","DOI":"10.1007\/978-3-030-11021-5_8"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Wang, X., Yu, K., Wu, S., Gu, J., Liu, Y., Dong, C., Qiao, Y., and Change Loy, C. (2018, January 8\u201314). Esrgan: Enhanced super-resolution generative adversarial networks. Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany.","DOI":"10.1007\/978-3-030-11021-5_5"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Wang, X., Xie, L., Dong, C., and Shan, Y. (2021, January 11\u201317). Real-esrgan: Training real-world blind super-resolution with pure synthetic data. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCVW54120.2021.00217"},{"key":"ref_24","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017, January 4\u20139). Attention is all you need. Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Yang, F., Yang, H., Fu, J., Lu, H., and Guo, B. (2020, January 13\u201319). Learning texture transformer network for image super-resolution. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00583"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lu\u010di\u0107, M., and Schmid, C. (2021, January 11\u201317). Vivit: A video vision transformer. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00676"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 11\u201317). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., and Timofte, R. (2021, January 11\u201317). Swinir: Image restoration using swin transformer. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCVW54120.2021.00210"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Yan, C., Shi, G., and Wu, Z. (2021, January 12\u201314). SMIR: A Transformer-Based Model for MRI super-resolution reconstruction. Proceedings of the 2021 IEEE International Conference on Medical Imaging Physics and Engineering (ICMIPE), Hefei, China.","DOI":"10.1109\/ICMIPE53131.2021.9698880"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"3365","DOI":"10.1109\/TPAMI.2020.2982166","article-title":"Deep learning for image super-resolution: A survey","volume":"43","author":"Wang","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_31","unstructured":"Dutta, V., and Zielinska, T. (2015). Networking technologies for robotic applications. arXiv."},{"key":"ref_32","unstructured":"(2022, October 18). Nagadomi. waifu2x. Available online: http:\/\/waifu2x.udp.jp."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"21811","DOI":"10.1007\/s11042-016-4020-z","article-title":"Sketch-based manga retrieval using manga109 dataset","volume":"76","author":"Matsui","year":"2017","journal-title":"Multimed. Tools Appl."},{"key":"ref_34","unstructured":"Bevilacqua, M., Roumy, A., Guillemot, C., and Alberi-Morel, M.L. (2012, January 3\u20137). Low-complexity single-image super-resolution based on nonnegative neighbor embedding. Proceedings of the 23rd British Machine Vision Conference (BMVC), Surrey, UK."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Tian, Y., Kong, Y., Zhong, B., and Fu, Y. (2018, January 18\u201322). Residual dense network for image super-resolution. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00262"},{"key":"ref_36","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012, January 3\u20136). Imagenet classification with deep convolutional neural networks. Proceedings of the Advances in Neural Information Processing Systems 25 (NIPS 2012), Lake Tahoe, NV, USA."},{"key":"ref_37","unstructured":"(2022, October 18). seeprettyface. Available online: seeprettyface.com."},{"key":"ref_38","unstructured":"(2022, October 18). Pixiv. Available online: https:\/\/www.pixiv.net\/."},{"key":"ref_39","unstructured":"Zeyde, R., Elad, M., and Protter, M. (2010). On single image scale-up using sparse-representations. Curves and Surfaces, Proceedings of the 7th International Conference, Avignon, France, 24\u201330 June 2010, Springer."},{"key":"ref_40","unstructured":"Martin, D., Fowlkes, C., Tal, D., and Malik, J. (2001, January 7\u201314). A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. Proceedings of the Eighth IEEE International Conference on Computer Vision ICCV 2001, Vancouver, BC, Canada."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Huang, J.B., Singh, A., and Ahuja, N. (2015, January 7\u201312). Single image super-resolution from transformed self-exemplars. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7299156"},{"key":"ref_42","unstructured":"(2022, October 18). wandb. Available online: https:\/\/wandb.ai."},{"key":"ref_43","unstructured":"Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., and Antiga, L. (2019, January 8\u201314). Pytorch: An imperative style, high-performance deep learning library. Proceedings of the Advances in Neural Information Processing Systems 32 (NIPS 2019), Vancouver, BC, Canada."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1145\/2184319.2184337","article-title":"Real-time computer vision with OpenCV","volume":"55","author":"Pulli","year":"2012","journal-title":"Commun. ACM"},{"key":"ref_45","unstructured":"(2022, October 18). timm. Available online: https:\/\/github.com\/rwightman\/pytorch-image-models."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Chen, X., Wang, X., Zhou, J., and Dong, C. (2022). Activating More Pixels in Image Super-Resolution Transformer. arXiv.","DOI":"10.1109\/CVPR52729.2023.02142"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/21\/8126\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:01:35Z","timestamp":1760144495000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/21\/8126"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,24]]},"references-count":46,"journal-issue":{"issue":"21","published-online":{"date-parts":[[2022,11]]}},"alternative-id":["s22218126"],"URL":"https:\/\/doi.org\/10.3390\/s22218126","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,10,24]]}}}