{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T01:16:56Z","timestamp":1760059016825,"version":"build-2065373602"},"reference-count":51,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2025,5,15]],"date-time":"2025-05-15T00:00:00Z","timestamp":1747267200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Graduate Research Innovation Project of Tianjin, China","award":["2021YJSS024"],"award-info":[{"award-number":["2021YJSS024"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Diffusion-based approaches have recently emerged as powerful alternatives to GAN-based virtual try-on methods, offering improved detail preservation and visual realism. Despite their advantages, the substantial number of parameters and intensive computational requirements pose significant barriers to deployment on low-resource platforms. To tackle these limitations, we propose a diffusion-based virtual try-on framework optimized through feature-level knowledge compression. Our method introduces MP-VTON, an enhanced inpainting pipeline based on Stable Diffusion, which incorporates improved Masking techniques and Pose-conditioned enhancement to alleviate garment boundary artifacts. To reduce model size while maintaining performance, we adopt an attention-guided distillation strategy that transfers semantic and structural knowledge from MP-VTON to a lightweight model, LiteMP-VTON. Experiments demonstrate that LiteMP-VTON achieves nearly a 3\u00d7 reduction in parameter count and close to 2\u00d7 speedup in inference, making it well suited for deployment in resource-limited environments without significantly compromising generation quality.<\/jats:p>","DOI":"10.3390\/info16050408","type":"journal-article","created":{"date-parts":[[2025,5,15]],"date-time":"2025-05-15T09:58:27Z","timestamp":1747303107000},"page":"408","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["LiteMP-VTON: A Knowledge-Distilled Diffusion Model for Realistic and Efficient Virtual Try-On"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9888-2587","authenticated-orcid":false,"given":"Shufang","family":"Zhang","sequence":"first","affiliation":[{"name":"School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-1524-7553","authenticated-orcid":false,"given":"Lei","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-3205-8610","authenticated-orcid":false,"given":"Wenxin","family":"Ding","sequence":"additional","affiliation":[{"name":"School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,5,15]]},"reference":[{"key":"ref_1","first-page":"6840","article-title":"Denoising diffusion probabilistic models","volume":"33","author":"Ho","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_2","unstructured":"Song, J., Meng, C., and Ermon, S. (2020). Denoising diffusion implicit models. arXiv."},{"key":"ref_3","unstructured":"Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., and Poole, B. (2020). Score-based generative modeling through stochastic differential equations. arXiv."},{"key":"ref_4","first-page":"8780","article-title":"Diffusion models beat gans on image synthesis","volume":"34","author":"Dhariwal","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_5","unstructured":"Ho, J., and Salimans, T. (2022). Classifier-free diffusion guidance. arXiv."},{"key":"ref_6","unstructured":"Li, X., Kampffmeyer, M., Dong, X., Xie, Z., Zhu, F., Dong, H., and Liang, X. (2023). WarpDiffusion: Efficient Diffusion Model for High-Fidelity Virtual Try-on. arXiv."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Zhu, L., Yang, D., Zhu, T., Reda, F., Chan, W., Saharia, C., Norouzi, M., and Kemelmacher-Shlizerman, I. (2023, January 17\u201324). Tryondiffusion: A tale of two unets. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.00447"},{"key":"ref_8","unstructured":"Morelli, D., Baldrati, A., Cartella, G., Cornia, M., Bertini, M., and Cucchiara, R. (November, January 29). Ladi-vton: Latent diffusion textual-inversion enhanced virtual try-on. Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, ON, Canada."},{"key":"ref_9","unstructured":"Gou, J., Sun, S., Zhang, J., Si, J., Qian, C., and Zhang, L. (November, January 29). Taming the power of diffusion models for high-quality virtual try-on with appearance flow. Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, ON, Canada."},{"key":"ref_10","first-page":"20662","article-title":"Snapfusion: Text-to-image diffusion model on mobile devices within two seconds","volume":"36","author":"Li","year":"2024","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1251","DOI":"10.1109\/TVCG.2015.2459902","article-title":"Mobilefusion: Real-time volumetric surface reconstruction and dense tracking on mobile phones","volume":"21","author":"Kohli","year":"2015","journal-title":"IEEE Trans. Vis. Comput. Graph."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1251","DOI":"10.1007\/s00190-019-01240-2","article-title":"Thin plate spline interpolation","volume":"93","author":"Keller","year":"2019","journal-title":"J. Geod."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Wang, B., Zheng, H., Liang, X., Chen, Y., Lin, L., and Yang, M. (2018, January 8\u201314). Toward characteristic-preserving image-based virtual try-on network. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01261-8_36"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Xie, Z., Huang, Z., Dong, X., Zhao, F., Dong, H., Zhang, X., Zhu, F., and Liang, X. (2023, January 17\u201324). Gp-vton: Towards general purpose virtual try-on via collaborative local-flow global-parsing learning. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.02255"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Wan, Y., Ding, N., and Yao, L. (2024). FA-VTON: A Feature Alignment-Based Model for Virtual Try-On. Appl. Sci., 14.","DOI":"10.3390\/app14125255"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Chen, C., Ni, J., and Zhang, P. (2024). Virtual Try-On Systems in Fashion Consumption: A Systematic Review. Appl. Sci., 14.","DOI":"10.3390\/app142411839"},{"key":"ref_17","unstructured":"Xie, Z., Huang, Z., Zhao, F., Dong, H., Kampffmeyer, M., Dong, X., Zhu, F., and Liang, X. (2022). Pasta-gan++: A versatile framework for high-resolution unpaired virtual try-on. arXiv."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Ge, Y., Song, Y., Zhang, R., Ge, C., Liu, W., and Luo, P. (2021, January 20\u201325). Parser-free virtual try-on via distilling appearance flows. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00838"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Choi, S., Park, S., Lee, M., and Choo, J. (2021, January 20\u201325). Viton-hd: High-resolution virtual try-on via misalignment-aware normalization. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01391"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Lee, S., Gu, G., Park, S., Choi, S., and Choo, J. (2022, January 23\u201327). High-resolution virtual try-on with misalignment and occlusion-handled conditions. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-19790-1_13"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"10225","DOI":"10.1109\/TMM.2024.3405718","article-title":"A two-stage personalized virtual try-on framework with shape control and texture guidance","volume":"26","author":"Zhang","year":"2024","journal-title":"IEEE Trans. Multimed."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Yang, Z., Zeng, A., Yuan, C., and Li, Y. (2023, January 1\u20136). Effective whole-body pose estimation with two-stages distillation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Paris, France.","DOI":"10.1109\/ICCVW60793.2023.00455"},{"key":"ref_23","unstructured":"Hinton, G. (2015). Distilling the Knowledge in a Neural Network. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Joyce, J.M. (2011). Kullback-Leibler Divergence, Springer.","DOI":"10.1007\/978-3-642-04898-2_327"},{"key":"ref_25","unstructured":"M\u00fcller, R., Kornblith, S., and Hinton, G.E. (2025, March 30). When Does Label Smoothing Help? Advances in Neural Information Processing Systems. Available online: https:\/\/dl.acm.org\/doi\/10.5555\/3454287.3454709."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Zhao, B., Cui, Q., Song, R., Qiu, Y., and Liang, J. (2022, January 18\u201324). Decoupled knowledge distillation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01165"},{"key":"ref_27","unstructured":"Kim, J., Park, S., and Kwak, N. (2025, March 30). Paraphrasing Complex Network: Network Compression via Factor Transfer. Advances in Neural Information Processing Systems 2018. Available online: https:\/\/dl.acm.org\/doi\/10.5555\/3327144.3327200."},{"key":"ref_28","unstructured":"Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., and Bengio, Y. (2014). Fitnets: Hints for thin deep nets. arXiv."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Park, W., Kim, D., Lu, Y., and Cho, M. (2019, January 15\u201320). Relational knowledge distillation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00409"},{"key":"ref_30","unstructured":"Salimans, T., and Ho, J. (2022). Progressive distillation for fast sampling of diffusion models. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Sun, W., Chen, D., Wang, C., Ye, D., Feng, Y., and Chen, C. (2023, January 10\u201314). Accelerating diffusion sampling with classifier-based feature distillation. Proceedings of the 2023 IEEE International Conference on Multimedia and Expo (ICME), Brisbane, Australia.","DOI":"10.1109\/ICME55011.2023.00144"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Wang, C., Guo, Z., Duan, Y., Li, H., Chen, N., Tang, X., and Hu, Y. (2024). Target-Driven Distillation: Consistency Distillation with Target Timestep Selection and Decoupled Guidance. arXiv.","DOI":"10.1609\/aaai.v39i7.32820"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Yang, D., Liu, S., Yu, J., Wang, H., Weng, C., and Zou, Y. (2022). Norespeech: Knowledge distillation based conditional diffusion model for noise-robust expressive tts. arXiv.","DOI":"10.21437\/Interspeech.2023-645"},{"key":"ref_34","unstructured":"Kingma, D.P. (2013). Auto-encoding variational bayes. arXiv."},{"key":"ref_35","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-net: Convolutional networks for biomedical image segmentation. Proceedings of the Medical Image Computing and Computer-Assisted Intervention\u2013MICCAI 2015: 18th International Conference, Munich, Germany. Proceedings, Part III 18."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. (2022, January 18\u201324). High-resolution image synthesis with latent diffusion models. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Liao, W., Jiang, Y., Liu, R., Feng, Y., Zhang, Y., Hou, J., and Wang, J. (2025). Stable Diffusion-Driven Conditional Image Augmentation for Transformer Fault Detection. Information, 16.","DOI":"10.3390\/info16030197"},{"key":"ref_38","unstructured":"Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W. (2021). Lora: Low-rank adaptation of large language models. arXiv."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Zhang, L., Rao, A., and Agrawala, M. (2023, January 1\u20136). Adding conditional control to text-to-image diffusion models. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Paris, France.","DOI":"10.1109\/ICCV51070.2023.00355"},{"key":"ref_40","unstructured":"Ye, H., Zhang, J., Liu, S., Han, X., and Yang, W. (2023). Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models. arXiv."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"G\u00fcler, R.A., Neverova, N., and Kokkinos, I. (2018, January 18\u201323). Densepose: Dense human pose estimation in the wild. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00762"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Cao, Z., Simon, T., Wei, S.E., and Sheikh, Y. (2017, January 21\u201326). Realtime multi-person 2d pose estimation using part affinity fields. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.143"},{"key":"ref_43","unstructured":"Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., and Clark, J. (2021, January 18\u201324). Learning transferable visual models from natural language supervision. Proceedings of the International Conference on Machine Learning, Virtual."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Kim, G., Kwon, T., and Ye, J.C. (2022, January 18\u201324). Diffusionclip: Text-guided diffusion models for robust image manipulation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00246"},{"key":"ref_45","first-page":"36479","article-title":"Photorealistic text-to-image diffusion models with deep language understanding","volume":"35","author":"Saharia","year":"2022","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_46","unstructured":"Simonyan, K. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"600","DOI":"10.1109\/TIP.2003.819861","article-title":"Image quality assessment: From error visibility to structural similarity","volume":"13","author":"Wang","year":"2004","journal-title":"IEEE Trans. Image Process."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Zhang, R., Isola, P., Efros, A.A., Shechtman, E., and Wang, O. (2018, January 18\u201323). The unreasonable effectiveness of deep features as a perceptual metric. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00068"},{"key":"ref_49","unstructured":"Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and Hochreiter, S. (2025, March 30). Gans Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. Advances in Neural Information Processing Systems. Available online: https:\/\/dl.acm.org\/doi\/abs\/10.5555\/3295222.3295408."},{"key":"ref_50","unstructured":"Bi\u0144kowski, M., Sutherland, D.J., Arbel, M., and Gretton, A. (2018). Demystifying mmd gans. arXiv."},{"key":"ref_51","unstructured":"Loshchilov, I. (2017). Decoupled weight decay regularization. arXiv."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/5\/408\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T17:33:27Z","timestamp":1760031207000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/5\/408"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,15]]},"references-count":51,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2025,5]]}},"alternative-id":["info16050408"],"URL":"https:\/\/doi.org\/10.3390\/info16050408","relation":{},"ISSN":["2078-2489"],"issn-type":[{"type":"electronic","value":"2078-2489"}],"subject":[],"published":{"date-parts":[[2025,5,15]]}}}