{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:09:40Z","timestamp":1750219780644,"version":"3.41.0"},"reference-count":45,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2022,9,30]],"date-time":"2022-09-30T00:00:00Z","timestamp":1664496000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2022,9,30]]},"abstract":"<jats:p>\n            The 2D virtual try-on task aims to transfer a target clothing image to the corresponding region of a person image. Although an extensive amount of research has been conducted due to its immense applications, this task still remains a great challenge to handle some complicated issues (e.g., non-rigid shapes, large occlusions and arbitrary poses). To this end, we propose a novel network with structural and textural consistency-preserving mechanism for producing high-fidelity try-on images. Specifically, we first generate the semantic layout of a clothing-agnostic person to obtain the segmentation map, which is used as the transforming conditions of the target clothes. Based on a recurrent network structure, the transform lookup is performed to iteratively update a dense flow. Then, we adopt a thin-plate-spline-based warping method to estimate the coarse offset flow for all key-point positions. Guided by this sparse flow, a multi-scale deformable convolution module is designed to further iteratively predict the fine offsets for densely sampled positions, by which the clothing item and person shape can be accurately aligned. Finally, we develop a refinement module to effectively fuse the global and local features, which can render accurate geometric structures of the body parts and maintain texture sharpness of the clothes. Extensive experiments on benchmark datasets demonstrate that our method outperforms other state-of-the-art methods in terms of quantitative and qualitative try-on results. The code is available on:\n            <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"https:\/\/github.com\/TJU-WEIHAO\/MLCN\">https:\/\/github.com\/TJU-WEIHAO\/MLCN<\/jats:ext-link>\n            .\n          <\/jats:p>","DOI":"10.1145\/3580500","type":"journal-article","created":{"date-parts":[[2023,1,19]],"date-time":"2023-01-19T13:20:18Z","timestamp":1674134418000},"page":"1-18","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["A Multi-Level Consistency Network for High-Fidelity Virtual Try-On"],"prefix":"10.1145","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4630-6422","authenticated-orcid":false,"given":"Hao","family":"Wei","sequence":"first","affiliation":[{"name":"Tianjin University, TianJin, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8003-4643","authenticated-orcid":false,"given":"Rui","family":"Chen","sequence":"additional","affiliation":[{"name":"Tianjin University, TianJin, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,3,16]]},"reference":[{"key":"e_1_3_1_2_2","volume-title":"International Conference on Learning Representations","author":"Brock Andrew","year":"2018","unstructured":"Andrew Brock, Jeff Donahue, and Karen Simonyan. 2018. Large scale GAN training for high fidelity natural image synthesis. In International Conference on Learning Representations."},{"doi-asserted-by":"publisher","key":"e_1_3_1_3_2","DOI":"10.1109\/TPAMI.2019.2929257"},{"doi-asserted-by":"publisher","key":"e_1_3_1_4_2","DOI":"10.1145\/3386569.3392386"},{"key":"e_1_3_1_5_2","first-page":"14131","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Choi Seunghwan","year":"2021","unstructured":"Seunghwan Choi, Sunghyun Park, Minsoo Lee, and Jaegul Choo. 2021. VITON-HD: High-resolution virtual try-on via misalignment-aware normalization. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 14131\u201314140."},{"key":"e_1_3_1_6_2","article-title":"The Frechet distance between multivariate normal distributions","author":"Dowson D. C.","year":"1982","unstructured":"D. C. Dowson and B. V. Landau. 1982. The Frechet distance between multivariate normal distributions. Journal of Multivariate Analysis (1982).","journal-title":"Journal of Multivariate Analysis"},{"key":"e_1_3_1_7_2","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1007\/BFb0086566","article-title":"Splines minimizing rotation-invariant seminorms in Sobolev spaces","volume":"572","author":"Duchon. Jean","year":"1977","unstructured":"Jean Duchon.1977. Splines minimizing rotation-invariant seminorms in Sobolev spaces. Constructive Theory of Functions of Several Variables 572 (1977), 85\u2013100.","journal-title":"Constructive Theory of Functions of Several Variables"},{"issue":"2","key":"e_1_3_1_8_2","article-title":"Interpretable partitioned embedding for intelligent multi-item fashion outfit composition","volume":"15","author":"Feng Zunlei","year":"2019","unstructured":"Zunlei Feng, Zhenyun Yu, Yongcheng Jing, Sai Wu, Mingli Song, Yezhou Yang, and Junxiao Jiang. 2019. Interpretable partitioned embedding for intelligent multi-item fashion outfit composition. ACM Trans. Multimedia Comput. Commun. Appl. 15, 2 (2019).","journal-title":"ACM Trans. Multimedia Comput. Commun. Appl."},{"issue":"2","key":"e_1_3_1_9_2","article-title":"Transform, warp, and dress: A new transformation-guided model for virtual try-On","volume":"18","author":"Fincato Matteo","year":"2022","unstructured":"Matteo Fincato, Marcella Cornia, Federico Landi, Fabio Cesari, and Rita Cucchiara. 2022. Transform, warp, and dress: A new transformation-guided model for virtual try-On. ACM Trans. Multimedia Comput. Commun. Appl. 18, 2 (2022).","journal-title":"ACM Trans. Multimedia Comput. Commun. Appl."},{"key":"e_1_3_1_10_2","volume-title":"ACM Multimedia","author":"Gao Xin","year":"2021","unstructured":"Xin Gao, Zhenjing Liu, Zunlei Feng, and Chengji Shen. 2021. Shape controllable virtual try-on for underwear models. In ACM Multimedia."},{"key":"e_1_3_1_11_2","first-page":"16928","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Ge Chongjian","year":"2021","unstructured":"Chongjian Ge, Yibing Song, Yuying Ge, and HanYang. 2021. Disentangled cycle consistency for highly-realistic virtual try-on. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 16928\u201316938."},{"key":"e_1_3_1_12_2","first-page":"8485","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Ge Yuying","year":"2021","unstructured":"Yuying Ge, Yibing Song, Ruimao Zhang, Chongjian Ge, Wei Liu, and Ping Luo. 2021. Parser-free virtual try-on via distilling appearance flows. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 8485\u20138493."},{"key":"e_1_3_1_13_2","first-page":"770","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Gong Ke","year":"2018","unstructured":"Ke Gong, Xiaodan Liang, Yicheng Li, Yimin Chen, Ming Yang, and Liang Lin. 2018. Instance-level human parsing via part grouping network. In Proceedings of the European Conference on Computer Vision. 770\u2013785."},{"key":"e_1_3_1_14_2","first-page":"2672","volume-title":"Advances in Neural Information Processing Systems","author":"Goodfellow Ian J.","year":"2014","unstructured":"Ian J. Goodfellow, Jean Pouget-Abadir, and Mehdi Mirza. 2014. Generative adversarial networks. In Advances in Neural Information Processing Systems. 2672\u20132680."},{"key":"e_1_3_1_15_2","first-page":"8738","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Gundogdu Erhan","year":"2019","unstructured":"Erhan Gundogdu, Victor Constantin, Amrollah Seifoddini, Minh Dang, Mathieu Salzmann, and Pascal Fua. 2019. GarNet: A two-stream network for fast and accurate 3D cloth draping. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 8738\u20138747."},{"key":"e_1_3_1_16_2","first-page":"10470","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Han Xintong","year":"2019","unstructured":"Xintong Han, Xiaojun Hu, Weilin Huang, and Matthew R. Scott. 2019. ClothFlow: A flow-based model for clothed person generation. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 10470\u201310479."},{"key":"e_1_3_1_17_2","first-page":"4480","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Han Xintong","year":"2019","unstructured":"Xintong Han, Zuxuan Wu, Weilin Huang, and Matthew Scott. 2019. FiNet: Compatible and diverse fashion image inpainting. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 4480\u20134490."},{"key":"e_1_3_1_18_2","first-page":"7543","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Han Xintong","year":"2018","unstructured":"Xintong Han, Zuxuan Wu, Zhe Wu, Ruichi Yu, and Larry S. Davis.2018. VITON: An image-based virtual try-on network. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 7543\u20137552."},{"key":"e_1_3_1_19_2","first-page":"2171","volume-title":"Proceedings of the IEEE Winter Conference on Applications of Computer Vision","author":"Jandial Surgan","year":"2020","unstructured":"Surgan Jandial, Ayush Chopra, Kumar Ayush, and Mayur Hemani. 2020. SieveNet: A unified framework for robust image-based virtual try-on. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 2171\u20132179."},{"key":"e_1_3_1_20_2","first-page":"2287","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision Workshops","author":"Jetchev Nikolay","year":"2017","unstructured":"Nikolay Jetchev and Urs Bergmann. 2017. The conditional analogy GAN: Swapping fashion articles on people images. In Proceedings of the IEEE\/CVF International Conference on Computer Vision Workshops. 2287\u20132292."},{"key":"e_1_3_1_21_2","volume-title":"Advances in Neural Information Processing Systems","author":"Karras Tero","year":"2021","unstructured":"Tero Karras, Miika Aittala, and Samuli Laine. 2021. Alias-free generative adversarial networks. In Advances in Neural Information Processing Systems."},{"key":"e_1_3_1_22_2","first-page":"4396","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Karras Tero","year":"2019","unstructured":"Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 4396\u20134405."},{"key":"e_1_3_1_23_2","first-page":"2287","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision Workshops","author":"Lee Hyug Jae","year":"2019","unstructured":"Hyug Jae Lee, Rokkyu Lee, Minseok Kang, and Myounghoon Cho. 2019. LA-VITON: A network for looking-attractive virtual try-on. In Proceedings of the IEEE\/CVF International Conference on Computer Vision Workshops. 2287\u20132292."},{"issue":"3","key":"e_1_3_1_24_2","article-title":"Deep-based self-refined face-top coordination","volume":"17","author":"Li Honglin","year":"2021","unstructured":"Honglin Li, Xiaoyang Mao, Mengdi Xu, and Xiaogang Jin. 2021. Deep-based self-refined face-top coordination. ACM Trans. Multimedia Comput. Commun. Appl. 17, 3 (2021).","journal-title":"ACM Trans. Multimedia Comput. Commun. Appl."},{"key":"e_1_3_1_25_2","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops","author":"MatiurRahman Minar","year":"2020","unstructured":"Minar MatiurRahman, Thai ThanhTuan, and Ahn Heejune. 2020. 3D reconstruction of clothes using a human body model and its application to image-based virtual try-on. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops."},{"key":"e_1_3_1_26_2","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops","author":"Minar Matiur Rahman","year":"2020","unstructured":"Matiur Rahman Minar, Thai Thanh Tuan, Heejune Ahn, Paul Rosin, and Yu-Kun Lai. 2020. CP-VTON+: Clothing shapeand texture preserving image-based virtual try-on. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops."},{"key":"e_1_3_1_27_2","first-page":"128","volume-title":"European Conference on ComputerVision","author":"Neverova Natalia","year":"2018","unstructured":"Natalia Neverova, Riza Alp Culer, and Iasonas Kokkinos. 2018. Dense pose transfer. In European Conference on ComputerVision. 128\u2013143."},{"key":"e_1_3_1_28_2","first-page":"2332","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Park Tarsung","year":"2019","unstructured":"Tarsung Park, Mingyu Liu, Tingchun Wang, and Junyan Zhu. 2019. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 2332\u20132341."},{"key":"e_1_3_1_29_2","first-page":"5967","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Zhu Tinghui Zhou, Phillip Isola, Junyan","year":"2017","unstructured":"Tinghui Zhou, Phillip Isola, Junyan Zhu, and Alexei A. Efros.2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 5967\u20135976."},{"doi-asserted-by":"publisher","key":"e_1_3_1_30_2","DOI":"10.1145\/3072959.3073711"},{"key":"e_1_3_1_31_2","first-page":"679","volume-title":"European Conference on Computer Vision","author":"Raj Amit","year":"2018","unstructured":"Amit Raj, Patsorn Sangkloy, Huiwen Chang, James Hays, Duygu Ceylan, and Jingwan Lu. 2018. SwapNet: Image based garment transfer. In European Conference on Computer Vision. 679\u2013695."},{"key":"e_1_3_1_32_2","first-page":"1060","volume-title":"33rd International Conference on Machine Learning","author":"Reed Scott","year":"2016","unstructured":"Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. 2016. Generative adversarial text to image synthesis. In 33rd International Conference on Machine Learning. 1060\u20131069."},{"key":"e_1_3_1_33_2","volume-title":"International Conference on 3D Body Scanning Technologies","author":"Sekine Masahiro","year":"2014","unstructured":"Masahiro Sekine, Kaoru Sugita, Frank Perbet, and BjornStengera. 2014. Virtual fitting by single-shot body shape estimation. In International Conference on 3D Body Scanning Technologies."},{"key":"e_1_3_1_34_2","volume-title":"International Conference on Learning Representations","author":"Simonyan Karen","year":"2015","unstructured":"Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations."},{"key":"e_1_3_1_35_2","first-page":"607","volume-title":"European Conference on Computer Vision","author":"Wang Bochao","year":"2018","unstructured":"Bochao Wang, Huabin Zheng, Xiaodan Liang, Yimin Chen, Liang Lin, and Meng Yang. 2018. Toward characteristic preserving image-based virtual try-on network. In European Conference on Computer Vision. 607\u2013623."},{"key":"e_1_3_1_36_2","first-page":"14050","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Wang Shengyu","year":"2021","unstructured":"Shengyu Wang, David Bau, and Junyan Zhu. 2021. Sketch your own GAN. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 14050\u201314060."},{"issue":"4","key":"e_1_3_1_37_2","first-page":"600","article-title":"Image quality assessment: From error visibility to structural similarity","volume":"13","author":"Wang Zhou","year":"2004","unstructured":"Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. 2004. Image quality assessment: From error visibility to structural similarity. International Conference on Learning Representations 13, 4 (2004), 600\u2013612.","journal-title":"International Conference on Learning Representations"},{"key":"e_1_3_1_38_2","first-page":"1316","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Xu Tao","year":"2018","unstructured":"Tao Xu, Pengchuan Zhang, Qiuyuan Huang, and Han Zhang. 2018. AttnGAN: Fine-grained text to image generation with attentional generative adversarial networks. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 1316\u20131324."},{"key":"e_1_3_1_39_2","first-page":"7850","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Yang Han","year":"2020","unstructured":"Han Yang, Ruimao Zhang, Xiaobao Guo, Wei Liu, Wangmeng Zuo, and Ping Luo. 2020. Towards photo-realistic virtual try-on by adaptively generating-preserving image content. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 7850\u20137859."},{"issue":"1","key":"e_1_3_1_40_2","article-title":"Attribute-wise explainable fashion compatibility modeling","volume":"17","author":"Yang Xin","year":"2021","unstructured":"Xin Yang, Xuemeng Song, Fuli Feng, Haokun Wen, Ling-Yu Duan, and Liqiang Nie. 2021. Attribute-wise explainable fashion compatibility modeling. ACM Trans. Multimedia Comput. Commun. Appl. 17, 1 (2021).","journal-title":"ACM Trans. Multimedia Comput. Commun. Appl."},{"key":"e_1_3_1_41_2","first-page":"10510","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Yu Ruiyun","year":"2019","unstructured":"Ruiyun Yu, Xiaoqi Wang, and Xiaohui Xie. 2019. VTNFP: An image-based virtual try-on network with body and clothing feature preservation. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 10510\u201310619."},{"key":"e_1_3_1_42_2","first-page":"5391","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Zanfir Mihai","year":"2018","unstructured":"Mihai Zanfir, Alin-Ionut Popa, Andrei Zanfir, and Cristian Sminchisescu. 2018. Human appearance transfer. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 5391\u20135399."},{"issue":"2","key":"e_1_3_1_43_2","article-title":"Tell, imagine, and search: End-to-end learning for composing text and image to image retrieval","volume":"18","author":"Zhang Feifei","year":"2022","unstructured":"Feifei Zhang, Mingliang Xu, and Changsheng Xu. 2022. Tell, imagine, and search: End-to-end learning for composing text and image to image retrieval. ACM Trans. Multimedia Comput. Commun. Appl. 18, 2 (2022).","journal-title":"ACM Trans. Multimedia Comput. Commun. Appl."},{"doi-asserted-by":"publisher","key":"e_1_3_1_44_2","DOI":"10.1109\/CVPR.2018.00068"},{"key":"e_1_3_1_45_2","volume-title":"SIGGRAPH Asia 2012 Technical Briefs","author":"Zhou Zhenglong","year":"2012","unstructured":"Zhenglong Zhou, Bo Shu, Shaojie Zhuo, Xiaoming Deng, Ping Tan, and Stephen Lin.2012. Image-based clothes animation for virtual fitting. In SIGGRAPH Asia 2012 Technical Briefs."},{"key":"e_1_3_1_46_2","first-page":"1689","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Zhu Shizhan","year":"2017","unstructured":"Shizhan Zhu, Sanja Fidler, Raquel Urtasun, Dahua Lin, and Chen-Change Loy. 2017. Be your own Prada: Fashion synthesis with structural coherence. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 1689\u20131697."}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3580500","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3580500","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:37:42Z","timestamp":1750178262000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3580500"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,30]]},"references-count":45,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2022,9,30]]}},"alternative-id":["10.1145\/3580500"],"URL":"https:\/\/doi.org\/10.1145\/3580500","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"type":"print","value":"1551-6857"},{"type":"electronic","value":"1551-6865"}],"subject":[],"published":{"date-parts":[[2022,9,30]]},"assertion":[{"value":"2022-07-09","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-01-12","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-03-16","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}