{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T17:24:02Z","timestamp":1772645042891,"version":"3.50.1"},"reference-count":63,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2023,12,11]],"date-time":"2023-12-11T00:00:00Z","timestamp":1702252800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Ph.D. in Artificial Intelligence for Society Program of Italy, the MUR PNRR project FAIR","award":["PE00000013"],"award-info":[{"award-number":["PE00000013"]}]},{"name":"NextGenerationEU and the EU H2020 AI4Media Project","award":["951911"],"award-info":[{"award-number":["951911"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2024,4,30]]},"abstract":"<jats:p>The 2D image-based virtual try-on has aroused increased interest from the multimedia and computer vision fields due to its enormous commercial value. Nevertheless, most existing image-based virtual try-on approaches directly combine the person-identity representation and the in-shop clothing items without taking their mutual correlations into consideration. Moreover, these methods are commonly established on pure convolutional neural networks (CNNs) architectures which are not simple to capture the long-range correlations among the input pixels. As a result, it generally results in inconsistent results. To alleviate these issues, in this article, we propose a novel two-stage cloth interactive transformer (CIT) method for the virtual try-on task. During the first stage, we design a CIT matching block, aiming at precisely capturing the long-range correlations between the cloth-agnostic person information and the in-shop cloth information. Consequently, it makes the warped in-shop clothing items look more natural in appearance. In the second stage, we put forth a CIT reasoning block for establishing global mutual interactive dependencies among person representation, the warped clothing item, and the corresponding warped cloth mask. The empirical results, based on mutual dependencies, demonstrate that the final try-on results are more realistic. Substantial empirical results on a public fashion dataset illustrate that the suggested CIT attains competitive virtual try-on performance.<\/jats:p>","DOI":"10.1145\/3617374","type":"journal-article","created":{"date-parts":[[2023,9,5]],"date-time":"2023-09-05T12:19:46Z","timestamp":1693916386000},"page":"1-20","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":14,"title":["Cloth Interactive Transformer for Virtual Try-On"],"prefix":"10.1145","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9790-1504","authenticated-orcid":false,"given":"Bin","family":"Ren","sequence":"first","affiliation":[{"name":"University of Trento and University of Pisa, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2077-1246","authenticated-orcid":false,"given":"Hao","family":"Tang","sequence":"additional","affiliation":[{"name":"ETH Z\u00fcrich, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5725-2178","authenticated-orcid":false,"given":"Fanyang","family":"Meng","sequence":"additional","affiliation":[{"name":"Peng Cheng Laboratory, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4987-0405","authenticated-orcid":false,"given":"Ding","family":"Runwei","sequence":"additional","affiliation":[{"name":"Peng Cheng Laboratory, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-0259-5732","authenticated-orcid":false,"given":"Philip H. S.","family":"Torr","sequence":"additional","affiliation":[{"name":"University of Oxford, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6597-7248","authenticated-orcid":false,"given":"Nicu","family":"Sebe","sequence":"additional","affiliation":[{"name":"University of Trento, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,12,11]]},"reference":[{"key":"e_1_3_1_2_2","first-page":"8387","volume-title":"Proceedings of the Conference on Computer Vision and Pattern Recognition","author":"Alldieck Thiemo","year":"2018","unstructured":"Thiemo Alldieck, Marcus Magnor, Weipeng Xu, Christian Theobalt, and Gerard Pons-Moll. 2018. Video based reconstruction of 3d people models. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 8387\u20138397."},{"key":"e_1_3_1_3_2","first-page":"409","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Bai Shuai","year":"2022","unstructured":"Shuai Bai, Huiling Zhou, Zhikang Li, Chang Zhou, and Hongxia Yang. 2022. Single stage virtual try-on via deformable attention flows. In Proceedings of the European Conference on Computer Vision. Springer, 409\u2013425."},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/34.993558"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/34.24792"},{"issue":"4","key":"e_1_3_1_6_2","first-page":"Article\u2013No","article-title":"Design preserving garment transfer","volume":"31","author":"Brouet Remi","year":"2012","unstructured":"Remi Brouet, Alla Sheffer, Laurence Boissieux, and Marie-Paule Cani. 2012. Design preserving garment transfer. ACM Transactions on Graphics 31, 4 (2012), Article\u2013No.","journal-title":"ACM Transactions on Graphics"},{"key":"e_1_3_1_7_2","first-page":"479","volume-title":"Proceedings of the International Conference on 3D Vision","author":"Chen Wenzheng","year":"2016","unstructured":"Wenzheng Chen, Huan Wang, Yangyan Li, Hao Su, Zhenhua Wang, Changhe Tu, Dani Lischinski, Daniel Cohen-Or, and Baoquan Chen. 2016. Synthesizing training images for boosting human 3d pose estimation. In Proceedings of the International Conference on 3D Vision. IEEE, 479\u2013488."},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01391"},{"key":"e_1_3_1_9_2","first-page":"8789","volume-title":"Proceedings of the Conference on Computer Vision and Pattern Recognition","author":"Choi Yunjey","year":"2018","unstructured":"Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2018. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 8789\u20138797."},{"key":"e_1_3_1_10_2","first-page":"5433","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Chopra Ayush","year":"2021","unstructured":"Ayush Chopra, Rishabh Jain, Mayur Hemani, and Balaji Krishnamurthy. 2021. Zflow: Gated appearance flow-based virtual try-on with 3d priors. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 5433\u20135442."},{"key":"e_1_3_1_11_2","first-page":"14638","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Cui Aiyu","year":"2021","unstructured":"Aiyu Cui, Daniel McKee, and Svetlana Lazebnik. 2021. Dressing in order: Recurrent person image generation for pose transfer, virtual try-on and outfit editing. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 14638\u201314647."},{"key":"e_1_3_1_12_2","volume-title":"Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics."},{"key":"e_1_3_1_13_2","first-page":"1161","volume-title":"Proceedings of the International Conference on Computer Vision","author":"Dong Haoye","year":"2019","unstructured":"Haoye Dong, Xiaodan Liang, Xiaohui Shen, Bowen Wu, Bing-Cheng Chen, and Jian Yin. 2019. Fw-gan: Flow-navigated warping gan for video virtual try-on. In Proceedings of the International Conference on Computer Vision. 1161\u20131170."},{"key":"e_1_3_1_14_2","unstructured":"Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly and others. 2020. An Image is Worth 16x16 Words: Transformers for image recognition at scale. In International Conference on Learning Representations ."},{"key":"e_1_3_1_15_2","first-page":"139","volume-title":"Proceedings of the IEEE\/ACM International Symposium on Mixed and Augmented Reality","author":"Ehara Jun","year":"2006","unstructured":"Jun Ehara and Hideo Saito. 2006. Texture overlay for virtual clothing based on PCA of silhouettes. In Proceedings of the IEEE\/ACM International Symposium on Mixed and Augmented Reality. Citeseer, 139\u2013142."},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV51458.2022.00226"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/3491226"},{"key":"e_1_3_1_18_2","first-page":"7669","volume-title":"Proceedings of the 2020 25th International Conference on Pattern Recognition","author":"Fincato Matteo","year":"2021","unstructured":"Matteo Fincato, Federico Landi, Marcella Cornia, Fabio Cesari, and Rita Cucchiara. 2021. VITON-GT: An image-based virtual try-on model with geometric transformations. In Proceedings of the 2020 25th International Conference on Pattern Recognition. IEEE, 7669\u20137676."},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01665"},{"key":"e_1_3_1_20_2","doi-asserted-by":"crossref","unstructured":"Xiaoling Gu Jun Yu Yongkang Wong and Mohan S. Kankanhalli. 2020. Toward multi-modal conditioned fashion image translation. IEEE Transactions on Multimedia 23 (2020) 2361\u20132371.","DOI":"10.1109\/TMM.2020.3009500"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/2185520.2185531"},{"key":"e_1_3_1_22_2","first-page":"8739","volume-title":"Proceedings of the International Conference on Computer Vision","author":"Gundogdu Erhan","year":"2019","unstructured":"Erhan Gundogdu, Victor Constantin, Amrollah Seifoddini, Minh Dang, Mathieu Salzmann, and Pascal Fua. 2019. Garnet: A two-stream network for fast and accurate 3d cloth draping. In Proceedings of the International Conference on Computer Vision. 8739\u20138748."},{"key":"e_1_3_1_23_2","first-page":"7543","volume-title":"Proceedings of the Conference on Computer Vision and Pattern Recognition","author":"Han Xintong","year":"2018","unstructured":"Xintong Han, Zuxuan Wu, Zhe Wu, Ruichi Yu, and Larry S. Davis. 2018. Viton: An image-based virtual try-on network. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 7543\u20137552."},{"key":"e_1_3_1_24_2","first-page":"3470","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"He Sen","year":"2022","unstructured":"Sen He, Yi-Zhe Song, and Tao Xiang. 2022. Style-based global appearance flow for virtual try-on. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 3470\u20133479."},{"key":"e_1_3_1_25_2","doi-asserted-by":"crossref","first-page":"619","DOI":"10.1007\/978-3-030-58565-5_37","volume-title":"Proceedings of the Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, 23\u201328, 2020","author":"Issenhuth Thibaut","year":"2020","unstructured":"Thibaut Issenhuth, J\u00e9r\u00e9mie Mary, and Cl\u00e9ment Calauzenes. 2020. Do not mask what you do not need to mask: A parser-free virtual try-on. In Proceedings of the Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, 23\u201328, 2020. Springer, 619\u2013635."},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1111\/j.1469-8137.1912.tb05611.x"},{"key":"e_1_3_1_27_2","first-page":"2287","volume-title":"Proceedings of the International Conference on Computer Vision Workshops","author":"Jetchev Nikolay","year":"2017","unstructured":"Nikolay Jetchev and Urs Bergmann. 2017. The conditional analogy gan: Swapping fashion articles on people images. In Proceedings of the International Conference on Computer Vision Workshops. 2287\u20132292."},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46475-6_43"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/3505244"},{"key":"e_1_3_1_30_2","first-page":"667","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Lahner Zorah","year":"2018","unstructured":"Zorah Lahner, Daniel Cremers, and Tony Tung. 2018. Deepwrinkles: Accurate and realistic clothing modeling. In Proceedings of the European Conference on Computer Vision. 667\u2013684."},{"key":"e_1_3_1_31_2","first-page":"204","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Lee Sangyun","year":"2022","unstructured":"Sangyun Lee, Gyojung Gu, Sunghyun Park, Seunghwan Choi, and Jaegul Choo. 2022. High-resolution virtual try-on with misalignment and occlusion-handled conditions. In Proceedings of the European Conference on Computer Vision. Springer, 204\u2013219."},{"key":"e_1_3_1_32_2","first-page":"15546","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Li Kedan","year":"2021","unstructured":"Kedan Li, Min Jin Chong, Jeffrey Zhang, and Jingen Liu. 2021. Toward accurate and realistic outfits visualization with attention to details. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 15546\u201315555."},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2019.2897897"},{"key":"e_1_3_1_34_2","unstructured":"Yahui Liu Bin Ren Yue Song Wei Bi Nicu Sebe and Wei Wang. 2022. Breaking the chain of gradient leakage in vision transformers. arXiv:2205.12551. Retrieved from https:\/\/arxiv.org\/abs\/2205.12551"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1023\/B:VISI.0000029664.99615.94"},{"key":"e_1_3_1_36_2","first-page":"5084","volume-title":"Proceedings of the Conference on Computer Vision and Pattern Recognition","author":"Men Yifang","year":"2020","unstructured":"Yifang Men, Yiming Mao, Yuning Jiang, Wei-Ying Ma, and Zhouhui Lian. 2020. Controllable person image synthesis with attribute-decomposed gan. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 5084\u20135093."},{"key":"e_1_3_1_37_2","first-page":"128","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Mikolajczyk Krystian","year":"2002","unstructured":"Krystian Mikolajczyk and Cordelia Schmid. 2002. An affine invariant interest point detector. In Proceedings of the European Conference on Computer Vision. 128\u2013142."},{"key":"e_1_3_1_38_2","first-page":"11","volume-title":"Proceedings of the Conference on Computer Vision and Pattern Recognition Workshops","volume":"2","author":"Minar M. R.","year":"2020","unstructured":"M. R. Minar, T. T. Tuan, H. Ahn, P. Rosin, and Y. K. Lai. 2020. Cp-vton+: Clothing shape and texture preserving image-based virtual try-on. In Proceedings of the Conference on Computer Vision and Pattern Recognition Workshops, Vol. 2. 11."},{"key":"e_1_3_1_39_2","doi-asserted-by":"crossref","first-page":"345","DOI":"10.1007\/978-3-031-20074-8_20","volume-title":"Proceedings of the Computer Vision\u2013ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23\u201327, 2022, Proceedings, Part VIII","author":"Morelli Davide","year":"2022","unstructured":"Davide Morelli, Matteo Fincato, Marcella Cornia, Federico Landi, Fabio Cesari, and Rita Cucchiara. 2022. Dress code: High-resolution multi-category virtual try-on. In Proceedings of the Computer Vision\u2013ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23\u201327, 2022, Proceedings, Part VIII. 345\u2013362."},{"key":"e_1_3_1_40_2","first-page":"48","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations","author":"Ott Myle","year":"2019","unstructured":"Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. 2019. fairseq: A fast, extensible toolkit for sequence modeling. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations. 48\u201353."},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/3072959.3073711"},{"key":"e_1_3_1_42_2","first-page":"20382","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Ren Bin","year":"2023","unstructured":"Bin Ren, Yahui Liu, Yue Song, Wei Bi, Rita Cucchiara, Nicu Sebe, and Wei Wang. 2023. Masked jigsaw puzzle: A versatile position embedding for vision transformers. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 20382\u201320391."},{"key":"e_1_3_1_43_2","article-title":"Cascaded cross MLP-mixer GANs for cross-view image translation","author":"Ren Bin","year":"2021","unstructured":"Bin Ren, Hao Tang, and Nicu Sebe. 2021. Cascaded cross MLP-mixer GANs for cross-view image translation. British Machine Vision Conference (2021).","journal-title":"British Machine Vision Conference"},{"key":"e_1_3_1_44_2","first-page":"6148","volume-title":"Proceedings of the Computer Vision and Pattern Recognition","author":"Rocco Ignacio","year":"2017","unstructured":"Ignacio Rocco, Relja Arandjelovic, and Josef Sivic. 2017. Convolutional neural network architecture for geometric matching. In Proceedings of the Computer Vision and Pattern Recognition. 6148\u20136157."},{"key":"e_1_3_1_45_2","volume-title":"Proceedings of the Neural Information Processing Systems","author":"Salimans Tim","year":"2016","unstructured":"Tim Salimans, Ian J. Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. 2016. Improved techniques for training GANs. In Proceedings of the Neural Information Processing Systems."},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.media.2019.01.012"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/34.589215"},{"issue":"5","key":"e_1_3_1_48_2","doi-asserted-by":"crossref","first-page":"1041","DOI":"10.1109\/TMM.2016.2639380","article-title":"Privacy preserving cloth try-on using mobile augmented reality","volume":"19","author":"Sekhavat Yoones A.","year":"2016","unstructured":"Yoones A. Sekhavat. 2016. Privacy preserving cloth try-on using mobile augmented reality. IEEE Transactions on Multimedia 19, 5 (2016), 1041\u20131049.","journal-title":"IEEE Transactions on Multimedia"},{"key":"e_1_3_1_49_2","first-page":"406","volume-title":"Proceedings of the Int. Conf. on 3D Body Scanning Technologies","author":"Sekine Masahiro","year":"2014","unstructured":"Masahiro Sekine, Kaoru Sugita, Frank Perbet, Bj\u00f6rn Stenger, and Masashi Nishiyama. 2014. Virtual fitting by single-shot body shape estimation. In Proceedings of the Int. Conf. on 3D Body Scanning Technologies. Citeseer, 406\u2013413."},{"key":"e_1_3_1_50_2","first-page":"5027","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing","author":"Strubell Emma","year":"2018","unstructured":"Emma Strubell, Patrick Verga, Daniel Andor, David Weiss, and Andrew McCallum. 2018. Linguistically-informed self-attention for semantic role labeling. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 5027\u20135038."},{"key":"e_1_3_1_51_2","volume-title":"Proceedings of the British Machine Vision Conference","author":"Tang Hao","year":"2020","unstructured":"Hao Tang, Song Bai, Philip H. S. Torr, and Nicu Sebe. 2020. Bipartite graph reasoning GANs for person image generation. In Proceedings of the British Machine Vision Conference."},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58595-2_43"},{"key":"e_1_3_1_53_2","unstructured":"Ilya Tolstikhin Neil Houlsby Alexander Kolesnikov Lucas Beyer Xiaohua Zhai Thomas Unterthiner Jessica Yung Andreas Steiner Daniel Keysers Jakob Uszkoreit Mario Lucic and Alexey Dosovitskiy. 2021. MLP-Mixer: An all-MLP architecture for vision. Advances in Neural Information Processing Systems (2021) 24261\u201324272."},{"key":"e_1_3_1_54_2","volume-title":"Proceedings of the Annual Meeting of the Association for Computational Linguistics","author":"Tsai Yao-Hung Hubert","year":"2019","unstructured":"Yao-Hung Hubert Tsai, Shaojie Bai, Paul Pu Liang, J. Zico Kolter, Louis-Philippe Morency, and Ruslan Salakhutdinov. 2019. Multimodal transformer for unaligned multimodal language sequences. In Proceedings of the Annual Meeting of the Association for Computational Linguistics."},{"key":"e_1_3_1_55_2","volume-title":"Proceedings of the Neural Information Processing Systems","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Neural Information Processing Systems."},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01261-8_36"},{"key":"e_1_3_1_57_2","first-page":"7794","volume-title":"Proceedings of the Conference on Computer Vision and Pattern Recognition","author":"Wang Xiaolong","year":"2018","unstructured":"Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local neural networks. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 7794\u20137803."},{"key":"e_1_3_1_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2003.819861"},{"key":"e_1_3_1_59_2","doi-asserted-by":"crossref","unstructured":"Jun Xu Yuanyuan Pu Rencan Nie Dan Xu Zhengpeng Zhao and Wenhua Qian. 2021. Virtual try-on network with attribute transformation and local rendering. IEEE Transactions on Multimedia 23 (2021) 2222\u20132234.","DOI":"10.1109\/TMM.2021.3070972"},{"key":"e_1_3_1_60_2","first-page":"7850","volume-title":"Proceedings of the Conference on Computer Vision and Pattern Recognition","author":"Yang Han","year":"2020","unstructured":"Han Yang, Ruimao Zhang, Xiaobao Guo, Wei Liu, Wangmeng Zuo, and Ping Luo. 2020. Towards photo-realistic virtual try-on by adaptively generating-preserving image content. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 7850\u20137859."},{"key":"e_1_3_1_61_2","first-page":"10511","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Yu Ruiyun","year":"2019","unstructured":"Ruiyun Yu, Xiaoqi Wang, and Xiaohui Xie. 2019. Vtnfp: An image-based virtual try-on network with body and clothing feature preservation. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 10511\u201310520."},{"key":"e_1_3_1_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2013.2280560"},{"key":"e_1_3_1_63_2","first-page":"586","volume-title":"Proceedings of the Conference on Computer Vision and Pattern Recognition","author":"Zhang Richard","year":"2018","unstructured":"Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 586\u2013595."},{"key":"e_1_3_1_64_2","first-page":"1680","volume-title":"Proceedings of the International Conference on Computer Vision","author":"Zhu Shizhan","year":"2017","unstructured":"Shizhan Zhu, Raquel Urtasun, Sanja Fidler, Dahua Lin, and Chen Change Loy. 2017. Be your own prada: Fashion synthesis with structural coherence. In Proceedings of the International Conference on Computer Vision. 1680\u20131688."}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3617374","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3617374","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:45:54Z","timestamp":1750178754000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3617374"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,11]]},"references-count":63,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,4,30]]}},"alternative-id":["10.1145\/3617374"],"URL":"https:\/\/doi.org\/10.1145\/3617374","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,12,11]]},"assertion":[{"value":"2022-12-11","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-08-18","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-12-11","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}