{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,3]],"date-time":"2026-06-03T08:00:03Z","timestamp":1780473603012,"version":"3.54.1"},"reference-count":58,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2023,12,19]],"date-time":"2023-12-19T00:00:00Z","timestamp":1702944000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100006374","name":"National Research Foundation of Korea","doi-asserted-by":"publisher","award":["RS-2023-00212780, RS-2023-00222663"],"award-info":[{"award-number":["RS-2023-00212780, RS-2023-00222663"]}],"id":[{"id":"10.13039\/501100006374","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Interact. Mob. Wearable Ubiquitous Technol."],"published-print":{"date-parts":[[2023,12,19]]},"abstract":"<jats:p>We present MIRROR, an on-device video virtual try-on (VTO) system that provides realistic, private, and rapid experiences in mobile clothes shopping. Despite recent advancements in generative adversarial networks (GANs) for VTO, designing MIRROR involves two challenges: (1) data discrepancy due to restricted training data that miss various poses, body sizes, and backgrounds and (2) local computation overhead that uses up 24% of battery for converting only a single video. To alleviate the problems, we propose a generalizable VTO GAN that not only discerns intricate human body semantics but also captures domain-invariant features without requiring additional training data. In addition, we craft lightweight, reliable clothes\/pose-tracking that generates refined pixel-wise warping flow without neural-net computation. As a holistic system, MIRROR integrates the new VTO GAN and tracking method with meticulous pre\/post-processing, operating in two distinct phases (on\/offline). Our results on Android smartphones and real-world user videos show that compared to a cutting-edge VTO GAN, MIRROR achieves 6.5\u00d7 better accuracy with 20.1\u00d7 faster video conversion and 16.9\u00d7 less energy consumption.<\/jats:p>","DOI":"10.1145\/3631420","type":"journal-article","created":{"date-parts":[[2024,1,12]],"date-time":"2024-01-12T12:52:04Z","timestamp":1705063924000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["MIRROR"],"prefix":"10.1145","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9392-9836","authenticated-orcid":false,"given":"Dong-Sig","family":"Kang","sequence":"first","affiliation":[{"name":"Seoul National University, Seoul, Republic of Korea"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-1335-6593","authenticated-orcid":false,"given":"Eunsu","family":"Baek","sequence":"additional","affiliation":[{"name":"Seoul National University, Seoul, Republic of Korea"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6338-9963","authenticated-orcid":false,"given":"Sungwook","family":"Son","sequence":"additional","affiliation":[{"name":"Seoul National University, Seoul, Republic of Korea"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1319-7071","authenticated-orcid":false,"given":"Youngki","family":"Lee","sequence":"additional","affiliation":[{"name":"Seoul National University, Seoul, Republic of Korea"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8967-3652","authenticated-orcid":false,"given":"Taesik","family":"Gong","sequence":"additional","affiliation":[{"name":"Nokia Bell Labs, Cambridge, United Kingdom"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8605-5077","authenticated-orcid":false,"given":"Hyung-Sin","family":"Kim","sequence":"additional","affiliation":[{"name":"Seoul National University, Seoul, Republic of Korea"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2024,1,12]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3356250.3360044"},{"key":"e_1_2_2_2_1","volume-title":"Single Stage Virtual Try-On Via Deformable Attention Flows. In European Conference on Computer Vision. Springer, 409--425","author":"Bai Shuai","year":"2022","unstructured":"Shuai Bai, Huiling Zhou, Zhikang Li, Chang Zhou, and Hongxia Yang. 2022. Single Stage Virtual Try-On Via Deformable Attention Flows. In European Conference on Computer Vision. Springer, 409--425."},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/280814.280821"},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.993558"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.24792"},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-24673-2_3"},{"key":"e_1_2_2_7_1","volume-title":"Large displacement optical flow: descriptor matching in variational motion estimation","author":"Brox Thomas","year":"2010","unstructured":"Thomas Brox and Jitendra Malik. 2010. Large displacement optical flow: descriptor matching in variational motion estimation. IEEE transactions on pattern analysis and machine intelligence 33, 3 (2010), 500--513."},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.143"},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3274783.3274834"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.254"},{"key":"e_1_2_2_11_1","volume-title":"Share of online traffic worldwide","author":"Chevalier Stephanie","year":"2020","unstructured":"Stephanie Chevalier. 2021. Share of online traffic worldwide 2020, by device. https:\/\/www.statista.com\/statistics\/296695\/preferred-mobile-payment-service-providers-mature-markets\/"},{"key":"e_1_2_2_12_1","unstructured":"Stephanie Chevalier. 2021. U.S. fashion e-commerce value 2015-2021. https:\/\/www.statista.com\/statistics\/736612\/fashion-e-commerce-market-usa\/"},{"key":"e_1_2_2_13_1","volume-title":"Kundan Sai Prabhu Thota, Sungho Suh, and Paul Lukowicz.","author":"Cho Yunmin","year":"2023","unstructured":"Yunmin Cho, Lala Shakti Swarup Ray, Kundan Sai Prabhu Thota, Sungho Suh, and Paul Lukowicz. 2023. ClothFit: Cloth-Human-Attribute Guided Virtual Try-On Network Using 3D Simulated Dataset. arXiv preprint arXiv:2306.13908 (2023)."},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01391"},{"key":"e_1_2_2_15_1","volume-title":"On-device Test Script Generation for Mobile Systems. In 2022 21st ACM\/IEEE International Conference on Information Processing in Sensor Networks (IPSN). IEEE, 477--490","author":"Choi Yousung","year":"2022","unstructured":"Yousung Choi, Ahreum Seo, and Hyung-Sin Kim. 2022. ScriptPainter: Vision-based, On-device Test Script Generation for Mobile Systems. In 2022 21st ACM\/IEEE International Conference on Information Processing in Sensor Networks (IPSN). IEEE, 477--490."},{"key":"e_1_2_2_16_1","first-page":"10","article-title":"ONNX Runtime. https:\/\/onnxruntime.ai\/","volume":"1","author":"ONNX","year":"2021","unstructured":"ONNX Runtime developers. 2021. ONNX Runtime. https:\/\/onnxruntime.ai\/. Version: 1.10.0.","journal-title":"Version"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00125"},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.316"},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.5555\/1763974.1764031"},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00838"},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.81"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00762"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00883"},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00787"},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2087756.2087759"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00346"},{"key":"e_1_2_2_27_1","unstructured":"Geoffrey Hinton Oriol Vinyals Jeff Dean et al. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 2 7 (2015)."},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.02134"},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.179"},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58565-5_37"},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01053"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASSP.1981.1163711"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3130800.3130829"},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACVW52041.2021.00025"},{"key":"e_1_2_2_35_1","volume-title":"Self-correction for human parsing","author":"Li Peike","year":"2020","unstructured":"Peike Li, Yunqiu Xu, Yunchao Wei, and Yi Yang. 2020. Self-correction for human parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020)."},{"key":"e_1_2_2_36_1","volume-title":"Look into person: Joint body parsing & pose estimation network and a new benchmark","author":"Liang Xiaodan","year":"2018","unstructured":"Xiaodan Liang, Ke Gong, Xiaohui Shen, and Liang Lin. 2018. Look into person: Joint body parsing & pose estimation network and a new benchmark. IEEE transactions on pattern analysis and machine intelligence 41, 4 (2018), 871--885."},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2015.2408360"},{"key":"e_1_2_2_38_1","volume-title":"FashionTex: Controllable Virtual Try-on with Text and Texture. arXiv preprint arXiv:2305.04451","author":"Lin Anran","year":"2023","unstructured":"Anran Lin, Nanxuan Zhao, Shuliang Ning, Yuda Qiu, Baoyuan Wang, and Xiaoguang Han. 2023. FashionTex: Controllable Virtual Try-on with Text and Texture. arXiv preprint arXiv:2305.04451 (2023)."},{"key":"e_1_2_2_39_1","unstructured":"Bruce D Lucas Takeo Kanade et al. 1981. An iterative image registration technique with an application to stereo vision. Vancouver."},{"key":"e_1_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3583120.3587045"},{"key":"e_1_2_2_41_1","volume-title":"Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019)."},{"key":"e_1_2_2_42_1","volume-title":"Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767","author":"Redmon Joseph","year":"2018","unstructured":"Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018)."},{"key":"e_1_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2011.6126544"},{"key":"e_1_2_2_44_1","volume-title":"Models matter, so does training: An empirical study of cnns for optical flow estimation","author":"Sun Deqing","year":"2019","unstructured":"Deqing Sun, Xiaodong Yang, Ming-Yu Liu, and Jan Kautz. 2019. Models matter, so does training: An empirical study of cnns for optical flow estimation. IEEE transactions on pattern analysis and machine intelligence 42, 6 (2019), 1408--1423."},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58536-5_24"},{"key":"e_1_2_2_46_1","volume-title":"Towards accurate generative models of video: A new metric & challenges. arXiv preprint arXiv:1812.01717","author":"Unterthiner Thomas","year":"2018","unstructured":"Thomas Unterthiner, Sjoerd van Steenkiste, Karol Kurach, Raphael Marinier, Marcin Michalski, and Sylvain Gelly. 2018. Towards accurate generative models of video: A new metric & challenges. arXiv preprint arXiv:1812.01717 (2018)."},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01261-8_36"},{"key":"e_1_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3384419.3431159"},{"key":"e_1_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00787"},{"key":"e_1_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3372224.3380881"},{"key":"e_1_2_2_51_1","volume-title":"Bitwidth-Adaptive Quantization-Aware Neural Network Training: A Meta-Learning Approach. In European Conference on Computer Vision. Springer, 208--224","author":"Youn Jiseok","year":"2022","unstructured":"Jiseok Youn, Jaehun Song, Hyung-Sin Kim, and Saewoong Bahk. 2022. Bitwidth-Adaptive Quantization-Aware Neural Network Training: A Meta-Learning Approach. In European Conference on Computer Vision. Springer, 208--224."},{"key":"e_1_2_2_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/3384419.3430726"},{"key":"e_1_2_2_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00068"},{"key":"e_1_2_2_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447993.3448628"},{"key":"e_1_2_2_55_1","volume-title":"Generalized cross entropy loss for training deep neural networks with noisy labels. Advances in neural information processing systems 31","author":"Zhang Zhilu","year":"2018","unstructured":"Zhilu Zhang and Mert Sabuncu. 2018. Generalized cross entropy loss for training deep neural networks with noisy labels. Advances in neural information processing systems 31 (2018)."},{"key":"e_1_2_2_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2006.1660446"},{"key":"e_1_2_2_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3475269"},{"key":"e_1_2_2_58_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58452-8_30"}],"container-title":["Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3631420","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3631420","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,27]],"date-time":"2025-08-27T17:00:55Z","timestamp":1756314055000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3631420"}},"subtitle":["Towards Generalizable On-Device Video Virtual Try-On for Mobile Shopping"],"short-title":[],"issued":{"date-parts":[[2023,12,19]]},"references-count":58,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,12,19]]}},"alternative-id":["10.1145\/3631420"],"URL":"https:\/\/doi.org\/10.1145\/3631420","relation":{},"ISSN":["2474-9567"],"issn-type":[{"value":"2474-9567","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,12,19]]},"assertion":[{"value":"2024-01-12","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}