{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T16:17:47Z","timestamp":1774369067159,"version":"3.50.1"},"reference-count":63,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2025,3,12]],"date-time":"2025-03-12T00:00:00Z","timestamp":1741737600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2025,4,30]]},"abstract":"<jats:p>\n            Deep learning exhibits powerful capability in image inpainting task, particularly in generating pixel-level details closely with the human visual perception. However, the complex background or larger missing regions make it still encounters the artifacts. Many researchers have investigated that prior information is crucial for guiding the image inpainting. In this article, we introduce the dual-attention mechanism, including lightweight spatial attention and linearized attention, to construct an end-to-end texture and structure-guided image inpainting method. In the first stage, we build the detail inpainting network with the lightweight spatial attention. In this model, the extracted texture and structural features are fused with multi-layers and then the fused detail image is considered as the prior to guide the detail repair of corrupted images. In the second stage, we construct the content completing network by the repaired detail and the linearized Transformer module. This module not only overcomes the limitation of the receptive field size of convolutional kernels that can improve the long-range modeling of features but also can significantly reduce the computational complexity of the original Transformer. To demonstrate the superior effectiveness of the proposed method, we perform extensive experiments with advanced models on three datasets: CelebA-HQ, Places2, and Paris Street Views. Comparative results manifest that our method achieves excellent image inpainting results that are conform to the human visual system. The code is available at\n            <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/QinLab-WFU\/TSGDAM\">https:\/\/github.com\/QinLab-WFU\/TSGDAM<\/jats:ext-link>\n          <\/jats:p>","DOI":"10.1145\/3715962","type":"journal-article","created":{"date-parts":[[2025,1,31]],"date-time":"2025-01-31T15:04:21Z","timestamp":1738335861000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Texture and Structure-Guided Dual-Attention Mechanism for Image Inpainting"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-7095-0855","authenticated-orcid":false,"given":"Runing","family":"Li","sequence":"first","affiliation":[{"name":"Qufu Normal University, Qufu, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1695-4629","authenticated-orcid":false,"given":"Jiangyan","family":"Dai","sequence":"additional","affiliation":[{"name":"Weifang University, Weifang, China and Hefei University of Technology, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7976-318X","authenticated-orcid":false,"given":"Qibing","family":"Qin","sequence":"additional","affiliation":[{"name":"Weifang University, Weifang, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-1131-1407","authenticated-orcid":false,"given":"Chengduan","family":"Wang","sequence":"additional","affiliation":[{"name":"Weifang University, Weifang, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1012-8089","authenticated-orcid":false,"given":"Huihui","family":"Zhang","sequence":"additional","affiliation":[{"name":"Weifang University, Weifang, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9828-0319","authenticated-orcid":false,"given":"Yugen","family":"Yi","sequence":"additional","affiliation":[{"name":"Jiangxi Normal University, Nanchang, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,3,12]]},"reference":[{"key":"e_1_3_1_2_2","first-page":"417","article-title":"Image inpainting","author":"Bertalmio Marcelo","year":"2000","unstructured":"Marcelo Bertalmio, Guillermo Sapiro, Vincent Caselles, and Coloma Ballester. 2000. Image inpainting. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, 417\u2013424.","journal-title":"Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/tpami.1986.4767851"},{"key":"e_1_3_1_4_2","first-page":"213","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV \u201920)","author":"Carion Nicolas","year":"2020","unstructured":"Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision (ECCV \u201920), 213\u2013229."},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2023.03.014"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2024.3369897"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/29.1644"},{"key":"e_1_3_1_8_2","first-page":"933","volume-title":"Proceedings of the International Conference on Machine Learning (ICML \u201917)","author":"Dauphin Yann N.","year":"2017","unstructured":"Yann N. Dauphin, Angela Fan, Michael Auli, and David Grangier. 2017. Language modeling with gated convolutional networks. In Proceedings of the International Conference on Machine Learning (ICML \u201917), 933\u2013941."},{"key":"e_1_3_1_9_2","first-page":"483","volume-title":"Proceedings of the European Conference on Computer\u00a0Vision (ECCV\u201922)","author":"Deng Ye","year":"2022","unstructured":"Ye Deng, Siqi Hui, Rongye Meng, Sanping Zhou, and Jinjun Wang. 2022. Hourglass attention network for image inpainting. In Proceedings of the European Conference on Computer\u00a0Vision (ECCV\u201922), 483\u2013501."},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2023.3298560"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3475426"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503161.3548446"},{"issue":"4","key":"e_1_3_1_13_2","doi-asserted-by":"crossref","first-page":"1705","DOI":"10.1109\/TIP.2018.2880681","article-title":"Image inpainting using nonlocal texture matching and nonlinear filtering","volume":"28","author":"Ding Ding","year":"2018","unstructured":"Ding Ding, Sundaresh Ram, and Jeffrey J. Rodr\u00edguez. 2018. Image inpainting using nonlocal texture matching and nonlinear filtering. IEEE Transactions on Image Processing 28, 4 (2018), 1705\u20131719.","journal-title":"IEEE Transactions on Image Processing"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/2830541"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","unstructured":"Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly et al. 2020. An image is worth 16\u00d716 words: Transformers for image recognition at scale. arxiv:2010.11929. Retrieved from 10.48550\/arXiv.2010.11929","DOI":"10.48550\/arXiv.2010.11929"},{"key":"e_1_3_1_16_2","doi-asserted-by":"crossref","first-page":"1033","DOI":"10.1109\/ICCV.1999.790383","article-title":"Texture synthesis by non-parametric sampling","author":"Efros Alexei A.","year":"1999","unstructured":"Alexei A. Efros and Thomas K. Leung. 1999. Texture synthesis by non-parametric sampling. In Proceedings of the Seventh IEEE International Conference on Computer Vision IEEE, 1033\u20131038.","journal-title":"Proceedings of the Seventh IEEE International Conference on Computer Vision IEEE"},{"key":"e_1_3_1_17_2","first-page":"10727","volume-title":"Proceedings of the 32nd International Conference on Neural Information Processing Systems","volume":"31","author":"Ghiasi Golnaz","year":"2018","unstructured":"Golnaz Ghiasi, Tsung-Yi Lin, and Quoc V. Le. 2018. DropBlock: A regularization method for convolutional networks. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Vol. 31, 10727\u201310737."},{"key":"e_1_3_1_18_2","first-page":"262","volume-title":"Proceedings of the 10th European Conference on Computer Vision (ECCV\u201908)","author":"Gray Douglas","year":"2008","unstructured":"Douglas Gray and Hai Tao. 2008. Viewpoint invariant pedestrian recognition with an ensemble of localized features. In Proceedings of the 10th European Conference on Computer Vision (ECCV\u201908). Springer, 262\u2013275."},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2017.2702738"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01387"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2023.109897"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/3072959.3073659"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.632"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","unstructured":"Tero Karras Timo Aila Samuli Laine and Jaakko Lehtinen. 2017. Progressive growing of GANS for improved quality stability and variation. arXiv:1710.10196. Retrieved from 10.48550\/arXiv.1710.10196","DOI":"10.48550\/arXiv.1710.10196"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","unstructured":"Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv:1412.6980. Retrieved from 10.48550\/arXiv.1412.6980","DOI":"10.48550\/arXiv.1412.6980"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIFS.2017.2730822"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00778"},{"key":"e_1_3_1_28_2","doi-asserted-by":"crossref","unstructured":"Kangshun Li Yunshan Wei Zhen Yang and Wenhua Wei. 2014. Image inpainting algorithm based on TV model and evolutionary algorithm. Soft Computing 20 (2014) 885\u2013893. Retrieved from https:\/\/api.semanticscholar.org\/CorpusID:270259389","DOI":"10.1007\/s00500-014-1547-7"},{"key":"e_1_3_1_29_2","first-page":"10748","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Li Wenbo","year":"2022","unstructured":"Wenbo Li, Zhe Lin, Kun Zhou, Lu Qi, Yi Wang, and Jiaya Jia. 2022. MAT: Mask-aware transformer for large hole image inpainting. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 10748\u201310758."},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.sigpro.2024.109672"},{"key":"e_1_3_1_31_2","first-page":"3156","volume-title":"Proceedings of the International Conference on Acoustics, Speech, and Signal Processing","author":"Liao Liang","year":"2018","unstructured":"Liang Liao, Ruimin Hu, Jing Xiao, and Zhongyuan Wang. 2018. Edge-aware context encoder for image inpainting. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, 3156\u20133160."},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","unstructured":"Lucas D. Lingle. 2023. Transformer-VQ: Linear-time transformers via vector quantization. arXiv:2309.16354. Retrieved from 10.48550\/arXiv.2309.16354","DOI":"10.48550\/arXiv.2309.16354"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01252-6_6"},{"key":"e_1_3_1_34_2","first-page":"725","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV \u201920)","author":"Liu Hongyu","year":"2020","unstructured":"Hongyu Liu, Bin Jiang, Yibing Song, Wei Huang, and Chao Yang. 2020. Rethinking image inpainting via a mutual encoder-decoder with feature equalizations. In Proceedings of the European Conference on Computer Vision (ECCV \u201920), 725\u2013741."},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01117"},{"key":"e_1_3_1_36_2","first-page":"3265","volume-title":"Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision Workshop (ICCVW\u201919)","author":"Nazeri Kamyar","year":"2019","unstructured":"Kamyar Nazeri, Eric Ng, Tony Joseph, Faisal Qureshi, and Mehran Ebrahimi. 2019. EdgeConnect: Structure guided image inpainting using edge prediction. In Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision Workshop (ICCVW\u201919), 3265\u20133274."},{"key":"e_1_3_1_37_2","doi-asserted-by":"crossref","unstructured":"Deepak Pathak Philipp Krahenbuhl Jeff Donahue Trevor Darrell and Alexei A. Efros. 2016. Context encoders: Feature learning by inpainting. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) IEEE 2536\u20132544.","DOI":"10.1109\/CVPR.2016.278"},{"key":"e_1_3_1_38_2","volume-title":"9th International Conference on Learning Representations","author":"Peng Hao","year":"2021","unstructured":"Hao Peng, Nikolaos Pappas, Dani Yogatama, Roy Schwartz, Noah Smith, and Lingpeng Kong. 2021. Random feature attention. In In 9th International Conference on Learning Representations, ICLR 2021. Virtual Event, Austria."},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","unstructured":"Zhen Qin Weixuan Sun Huicai Deng Dongxu Li Yunshen Wei Baohong Lv Junjie Yan Lingpeng Kong and Yiran Zhong. 2022. cosFormer: Rethinking Softmax in attention. arXiv:2202.08791. Retrieved from 10.48550\/arXiv.2202.08791","DOI":"10.48550\/arXiv.2202.08791"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-023-01977-6"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2022.3152624"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00027"},{"key":"e_1_3_1_43_2","first-page":"10674","volume-title":"Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201922)","author":"Rombach Robin","year":"2022","unstructured":"Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj\u00f6rn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201922), 10674\u201310685."},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2014.2372479"},{"key":"e_1_3_1_45_2","first-page":"11352","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Sagong Min-cheol","year":"2019","unstructured":"Min-cheol Sagong, Yong-goo Shin, Seung-wook Kim, Seung Park, and Sung-jea Ko. 2019. PEPSI : Fast image inpainting with parallel decoding network. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 11352\u201311360."},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00034-019-01029-w"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV51458.2022.00323"},{"key":"e_1_3_1_48_2","article-title":"Attention is all you need","volume":"30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems, Vol. 30, 5998\u20136008.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00465"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2020.3048629"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2003.819861"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2021.3111491"},{"key":"e_1_3_1_54_2","first-page":"5833","volume-title":"Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR \u201919)","author":"Xiong Wei","year":"2019","unstructured":"Wei Xiong, Jiahui Yu, Zhe Lin, Jimei Yang, Xin Lu, Connelly Barnes, and Jiebo Luo. 2019. Foreground-aware image inpainting. In Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR \u201919), 5833\u20135841."},{"key":"e_1_3_1_55_2","first-page":"1","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV \u201918)","author":"Yan Zhaoyi","year":"2018","unstructured":"Zhaoyi Yan, Xiaoming Li, Mu Li, Wangmeng Zuo, and Shiguang Shan. 2018. Shift-net: Image inpainting via deep feature rearrangement. In Proceedings of the European Conference on Computer Vision (ECCV \u201918), 1\u201317."},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00577"},{"key":"e_1_3_1_57_2","first-page":"12733","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"34","author":"Yu Tao","year":"2020","unstructured":"Tao Yu, Zongyu Guo, Xin Jin, Shilin Wu, Zhibo Chen, Weiping Li, Zhizheng Zhang, and Sen Liu. 2020. Region normalization for image inpainting. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 12733\u201312740."},{"key":"e_1_3_1_58_2","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3475436"},{"key":"e_1_3_1_59_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2019.10.083"},{"key":"e_1_3_1_60_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00158"},{"key":"e_1_3_1_61_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2022.3219728"},{"key":"e_1_3_1_62_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2022.08.033"},{"key":"e_1_3_1_63_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2723009"},{"key":"e_1_3_1_64_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2021.3076310"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3715962","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3715962","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:18:49Z","timestamp":1750295929000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3715962"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,12]]},"references-count":63,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,4,30]]}},"alternative-id":["10.1145\/3715962"],"URL":"https:\/\/doi.org\/10.1145\/3715962","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,3,12]]},"assertion":[{"value":"2024-08-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-01-23","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-03-12","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}