{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,9]],"date-time":"2026-03-09T13:30:36Z","timestamp":1773063036412,"version":"3.50.1"},"reference-count":153,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2024,1,22]],"date-time":"2024-01-22T00:00:00Z","timestamp":1705881600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2024,6,30]]},"abstract":"<jats:p>Fashion is the manner in which we introduce ourselves to the world and has become perhaps the biggest industry on the planet. In recent years, fashion-related research has received a lot of attention from computer vision researchers as a result of the growing demand by the fashion industry. Fashion image retrieval (FIR) is a difficult initiative and requires finding the right items from a huge collection of fashion items based on an image query. FIR has been applied successfully to clothing and footwear. Despite ongoing advances, FIR still suffers from limitations when applied to real-world visual endeavors. However, research on complex design items, for example, ornaments, has received less attention due to the complex nature of similarity and the unavailability of suitable datasets. This article presents a review of FIR and evaluation systems from different design datasets. The motivation behind this review is, to sum up the state-of-the-art procedures for retrieving fashion images for a given query image. In addition, we highlight promising directions for future research.<\/jats:p>","DOI":"10.1145\/3636552","type":"journal-article","created":{"date-parts":[[2023,12,13]],"date-time":"2023-12-13T11:34:21Z","timestamp":1702467261000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":15,"title":["A Survey on Fashion Image Retrieval"],"prefix":"10.1145","volume":"56","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2470-1269","authenticated-orcid":false,"given":"Sk Maidul","family":"Islam","sequence":"first","affiliation":[{"name":"Global Institute of Science &amp; Technology, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1542-3757","authenticated-orcid":false,"given":"Subhankar","family":"Joardar","sequence":"additional","affiliation":[{"name":"Haldia Institute of Technology, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0706-2565","authenticated-orcid":false,"given":"Arif Ahmed","family":"Sekh*","sequence":"additional","affiliation":[{"name":"XIM University, India"}]}],"member":"320","published-online":{"date-parts":[[2024,1,22]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00804"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2019.00379"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV.2018.00186"},{"key":"e_1_3_1_5_2","doi-asserted-by":"crossref","unstructured":"Kenan E. Ak Joo Hwee Lim Jo Yew Tham and Ashraf A. Kassim. 2018. Which shirt for my first date? towards a flexible attribute-based fashion query system. Pattern Recognition Letters 112 (2018) 212\u2013218.","DOI":"10.1016\/j.patrec.2018.07.019"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.02080"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298654"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDH51081.2020.00012"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00976"},{"key":"e_1_3_1_10_2","unstructured":"Wei Chen Yu Liu Weiping Wang Tinne Tuytelaars Erwin M. Bakker and Michael Lew. 2020. On the exploration of incremental learning for fine-grained image retrieval. Proceedings of the (BMVC\u201920) BMVC. Retrieved from https:\/\/arxiv.org\/abs\/2010.08020"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00307"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.2019.8802944"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3447239"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.444"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2017.266"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3444685.3446321"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58580-8_12"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2019.2903661"},{"key":"e_1_3_1_19_2","article-title":"Bert: Pre-training of deep bidirectional transformers for language understanding","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805. Retrieved from https:\/\/arxiv.org\/abs\/1810.04805","journal-title":"arXiv:1810.04805"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW53098.2021.00435"},{"key":"e_1_3_1_21_2","article-title":"Modality-agnostic attention fusion for visual search with text feedback","author":"Dodds Eric","year":"2020","unstructured":"Eric Dodds, Jack Culpepper, Simao Herdade, Yang Zhang, and Kofi Boakye. 2020. Modality-agnostic attention fusion for visual search with text feedback. arXiv:2007.00145. Retrieved from https:\/\/arxiv.org\/abs\/2007.00145","journal-title":"arXiv:2007.00145"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00814"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00957"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2021.3115658"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV.2017.64"},{"key":"e_1_3_1_26_2","article-title":"M5product: A multi-modal pretraining benchmark for e-commercial product downstream tasks","author":"Dong Xiao","year":"2021","unstructured":"Xiao Dong, Xunlin Zhan, Yangxin Wu, Yunchao Wei, Xiaoyong Wei, Minlong Lu, and Xiaodan Liang. 2021. M5product: A multi-modal pretraining benchmark for e-commercial product downstream tasks. arXiv:2109.04275. Retrieved from https:\/\/arxiv.org\/abs\/2109.04275","journal-title":"arXiv:2109.04275"},{"key":"e_1_3_1_27_2","article-title":"Training vision transformers for image retrieval","author":"El-Nouby Alaaeldin","year":"2021","unstructured":"Alaaeldin El-Nouby, Natalia Neverova, Ivan Laptev, and Herv\u00e9 J\u00e9gou. 2021. Training vision transformers for image retrieval. arXiv:2102.05644. Retrieved from https:\/\/arxiv.org\/abs\/2102.05644","journal-title":"arXiv:2102.05644"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2018.00243"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401430"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/3512527.3531355"},{"key":"e_1_3_1_31_2","doi-asserted-by":"crossref","unstructured":"Zhanghui Kuang Yiming Gao Guanbin Li Ping Luo Yimin Chen Liang Lin and Wayne Zhang. 2019. Fashion retrieval via graph reasoning networks on a similarity pyramid. In Proceedings of the IEEE\/CVF International Conference on Computer Vision 3066\u20133075.","DOI":"10.1109\/ICCV.2019.00316"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2017.270"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jvcir.2019.102577"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00548"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV51458.2022.00059"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01371"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46466-4_15"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-017-1016-8"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2020.102276"},{"key":"e_1_3_1_40_2","article-title":"Dialog-based interactive image retrieval","volume":"31","author":"Guo Xiaoxiao","year":"2018","unstructured":"Xiaoxiao Guo, Hui Wu, Yu Cheng, Steven Rennie, Gerald Tesauro, and Rogerio Feris. 2018. Dialog-based interactive image retrieval. Advances in Neural Information Processing Systems 31 (2018).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.382"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.163"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3123394"},{"key":"e_1_3_1_44_2","doi-asserted-by":"crossref","unstructured":"Xiao Han Licheng Yu Xiatian Zhu Li Zhang Yi-Zhe Song and Tao Xiang. 2022. Fashionvil: Fashion-focused vision-and-language representation learning. European Conference on Computer Vision Springer 634\u2013651.","DOI":"10.1007\/978-3-031-19833-5_37"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.322"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2016.0116"},{"key":"e_1_3_1_47_2","article-title":"FashionNet: Personalized outfit recommendation with deep neural network","author":"He Tong","year":"2018","unstructured":"Tong He and Yang Hu. 2018. FashionNet: Personalized outfit recommendation with deep neural network. arXiv:1810.02443. Retrieved from https:\/\/arxiv.org\/abs\/1810.02443","journal-title":"arXiv:1810.02443"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/3240508.3240546"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/2733373.2806239"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.127"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654885"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1145\/3126686.3126773"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1007\/s42979-021-00734-1"},{"key":"e_1_3_1_54_2","first-page":"100","volume-title":"Proceedings of the International Conference on Computer Vision and Image Processing","author":"Islam Sk Maidul","year":"2020","unstructured":"Sk Maidul Islam, Subhankar Joardar, and Arif Ahmed Sekh. 2020. RingFIR: A large volume earring dataset for fashion image retrieval. In Proceedings of the International Conference on Computer Vision and Image Processing. Springer, 100\u2013111."},{"key":"e_1_3_1_55_2","doi-asserted-by":"crossref","unstructured":"Sk Maidul Islam Subhankar Joardar and Arif Ahmed Sekh. 2021. RingFIR: A large volume earring dataset for fashion image retrieval. Springer Singapore 100\u2013111.","DOI":"10.1007\/978-981-16-1092-9_9"},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-022-14204-0"},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.1145\/3109859.3109861"},{"key":"e_1_3_1_58_2","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3123429"},{"key":"e_1_3_1_59_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58452-8_19"},{"key":"e_1_3_1_60_2","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2967182"},{"key":"e_1_3_1_61_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i07.6773"},{"key":"e_1_3_1_62_2","doi-asserted-by":"publisher","DOI":"10.3390\/electronics9030508"},{"key":"e_1_3_1_63_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2017.30"},{"key":"e_1_3_1_64_2","first-page":"1","volume-title":"Proceedings of the 2018 International Conference on Research in Intelligent and Computing in Engineering","author":"Kashilani Divva","year":"2018","unstructured":"Divva Kashilani, Lalit B. Damahe, and Nileshsingh V. Thakur. 2018. An overview of image recognition and retrieval of clothing items. In Proceedings of the 2018 International Conference on Research in Intelligent and Computing in Engineering. IEEE, 1\u20136."},{"key":"e_1_3_1_65_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.2018.8451281"},{"key":"e_1_3_1_66_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i2.16271"},{"key":"e_1_3_1_67_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2019.00376"},{"key":"e_1_3_1_68_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00316"},{"key":"e_1_3_1_69_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2019.00047"},{"key":"e_1_3_1_70_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298947"},{"key":"e_1_3_1_71_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00267"},{"key":"e_1_3_1_72_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00267"},{"key":"e_1_3_1_73_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00086"},{"key":"e_1_3_1_74_2","first-page":"1","article-title":"A new algorithm for sketch-based fashion image retrieval based on cross-domain transformation","volume":"2021","author":"Lei Haopeng","year":"2021","unstructured":"Haopeng Lei, Simin Chen, Mingwen Wang, Xiangjian He, Wenjing Jia, and Sibo Li. 2021. A new algorithm for sketch-based fashion image retrieval based on cross-domain transformation. Wireless Communications and Mobile Computing 2021 (2021), 1\u201314.","journal-title":"Wireless Communications and Mobile Computing"},{"key":"e_1_3_1_75_2","doi-asserted-by":"publisher","DOI":"10.1177\/1550147718815627"},{"key":"e_1_3_1_76_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2017.2690144"},{"key":"e_1_3_1_77_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2016.2542983"},{"key":"e_1_3_1_78_2","doi-asserted-by":"publisher","DOI":"10.1145\/3240508.3241399"},{"key":"e_1_3_1_79_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.133"},{"key":"e_1_3_1_80_2","doi-asserted-by":"publisher","DOI":"10.1145\/2671188.2749318"},{"key":"e_1_3_1_81_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.227"},{"key":"e_1_3_1_82_2","doi-asserted-by":"publisher","DOI":"10.1145\/2911996.2912058"},{"key":"e_1_3_1_83_2","doi-asserted-by":"publisher","DOI":"10.1109\/MMUL.2014.25"},{"key":"e_1_3_1_84_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2012.6248071"},{"key":"e_1_3_1_85_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.124"},{"key":"e_1_3_1_86_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46475-6_15"},{"key":"e_1_3_1_87_2","doi-asserted-by":"publisher","DOI":"10.1145\/2557642.2563675"},{"key":"e_1_3_1_88_2","doi-asserted-by":"publisher","DOI":"10.1145\/2483977.2483984"},{"key":"e_1_3_1_89_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.126"},{"key":"e_1_3_1_90_2","doi-asserted-by":"publisher","DOI":"10.1145\/3372278.3390677"},{"key":"e_1_3_1_91_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i07.6845"},{"key":"e_1_3_1_92_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.3013631"},{"key":"e_1_3_1_93_2","doi-asserted-by":"crossref","unstructured":"Suvir Mirchandani Licheng Yu Mengjiao Wang Animesh Sinha Wenwen Jiang Tao Xiang and Ning Zhang. 2022. FaD-VLP: Fashion vision-and-language pre-training towards unified retrieval and captioning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing 10484\u201310497.","DOI":"10.18653\/v1\/2022.emnlp-main.716"},{"key":"e_1_3_1_94_2","doi-asserted-by":"publisher","DOI":"10.1145\/3512527.3531433"},{"key":"e_1_3_1_95_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58595-2_16"},{"key":"e_1_3_1_96_2","first-page":"1","article-title":"Survey on clothing image retrieval with cross-domain","author":"Ning Chen","year":"2022","unstructured":"Chen Ning, Yang Di, and Li Menglu. 2022. Survey on clothing image retrieval with cross-domain. Complex & Intelligent Systems (2022), 1\u201314.","journal-title":"Complex & Intelligent Systems"},{"key":"e_1_3_1_97_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV45572.2020.9093402"},{"key":"e_1_3_1_98_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2020.07.092"},{"key":"e_1_3_1_99_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2019.00042"},{"key":"e_1_3_1_100_2","unstructured":"Negar Rostamzadeh Seyedarian Hosseini Thomas Boquet Wojciech Stokowiec Ying Zhang Christian Jauvin and Chris Pal. 2018. Fashion-Gen: The generative fashion dataset and challenge. Stat 1050 (2018) 30."},{"key":"e_1_3_1_101_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00289"},{"key":"e_1_3_1_102_2","article-title":"Using artificial intelligence to analyze fashion trends","author":"Shi Mengyun","year":"2020","unstructured":"Mengyun Shi and Van Dyk Lewis. 2020. Using artificial intelligence to analyze fashion trends. arXiv:2005.00986. Retrieved from https:\/\/arxiv.org\/abs\/2005.00986","journal-title":"arXiv:2005.00986"},{"key":"e_1_3_1_103_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11276"},{"key":"e_1_3_1_104_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-020-01305-2"},{"key":"e_1_3_1_105_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.592"},{"key":"e_1_3_1_106_2","doi-asserted-by":"publisher","DOI":"10.1109\/MMUL.2018.2875860"},{"key":"e_1_3_1_107_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2017.262"},{"key":"e_1_3_1_108_2","article-title":"Where to look and how to describe: Fashion image retrieval with an attentional heterogeneous bilinear network","author":"Su Haibo","year":"2020","unstructured":"Haibo Su, Peng Wang, Lingqiao Liu, Hui Li, Zhen Li, and Yanning Zhang. 2020. Where to look and how to describe: Fashion image retrieval with an attentional heterogeneous bilinear network. IEEE Transactions on Circuits and Systems for Video Technology (2020).","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"key":"e_1_3_1_109_2","doi-asserted-by":"publisher","DOI":"10.5555\/2826112.2826142"},{"key":"e_1_3_1_110_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV56688.2023.00107"},{"key":"e_1_3_1_111_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCCAS.2010.5581949"},{"key":"e_1_3_1_112_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01270-0_24"},{"key":"e_1_3_1_113_2","article-title":"Attention is all you need","volume":"30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_114_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.193"},{"key":"e_1_3_1_115_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.2018.8451164"},{"key":"e_1_3_1_116_2","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3123326"},{"key":"e_1_3_1_117_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00449"},{"key":"e_1_3_1_118_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00516"},{"key":"e_1_3_1_119_2","doi-asserted-by":"publisher","DOI":"10.1109\/VCIP.2017.8305144"},{"key":"e_1_3_1_120_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2019.2913513"},{"key":"e_1_3_1_121_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2017.2688133"},{"key":"e_1_3_1_122_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01115"},{"key":"e_1_3_1_123_2","article-title":"Fashion-mnist: A novel image dataset for benchmarking machine learning algorithms","author":"Xiao Han","year":"2017","unstructured":"Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-mnist: A novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747. Retrieved from https:\/\/arxiv.org\/abs\/1708.07747","journal-title":"arXiv:1708.07747"},{"key":"e_1_3_1_124_2","doi-asserted-by":"publisher","DOI":"10.5555\/2354409.2355126"},{"key":"e_1_3_1_125_2","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3123396"},{"key":"e_1_3_1_126_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i01.5362"},{"key":"e_1_3_1_127_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.3301403"},{"key":"e_1_3_1_128_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2017.2765836"},{"key":"e_1_3_1_129_2","doi-asserted-by":"publisher","DOI":"10.1145\/3308558.3313739"},{"key":"e_1_3_1_130_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.32"},{"key":"e_1_3_1_131_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.93"},{"key":"e_1_3_1_132_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01225-0_9"},{"key":"e_1_3_1_133_2","doi-asserted-by":"publisher","DOI":"10.1145\/3404835.3462881"},{"key":"e_1_3_1_134_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7299064"},{"key":"e_1_3_1_135_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2019.01.001"},{"key":"e_1_3_1_136_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00521-018-3691-y"},{"key":"e_1_3_1_137_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01267-0_36"},{"key":"e_1_3_1_138_2","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3219820"},{"key":"e_1_3_1_139_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i18.18033"},{"key":"e_1_3_1_140_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP39728.2021.9413617"},{"key":"e_1_3_1_141_2","first-page":"27196","article-title":"UFC-BERT: Unifying multi-modal controls for conditional image synthesis","author":"Zhang Zhu","year":"2021","unstructured":"Zhu Zhang, Jianxin Ma, Chang Zhou, Rui Men, Zhikang Li, Ming Ding, Jie Tang, Jingren Zhou, and Hongxia Yang. 2021. UFC-BERT: Unifying multi-modal controls for conditional image synthesis. In Proceedings of the Advances in Neural Information Processing Systems, 27196\u201327208.","journal-title":"In Proceedings of the Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_142_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.652"},{"key":"e_1_3_1_143_2","first-page":"1054","volume-title":"Proceedings of the 29th International Conference on International Joint Conferences on Artificial Intelligence","author":"Zhao Hongrui","year":"2021","unstructured":"Hongrui Zhao, Jin Yu, Yanan Li, Donghui Wang, Jie Liu, Hongxia Yang, and Fei Wu. 2021. Dress like an internet celebrity: Fashion retrieval in videos. In Proceedings of the 29th International Conference on International Joint Conferences on Artificial Intelligence. 1054\u20131060."},{"key":"e_1_3_1_144_2","doi-asserted-by":"publisher","DOI":"10.1177\/0887302X18821187"},{"key":"e_1_3_1_145_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2022.08.011"},{"key":"e_1_3_1_146_2","doi-asserted-by":"publisher","DOI":"10.1109\/MMUL.2020.3024221"},{"key":"e_1_3_1_147_2","doi-asserted-by":"publisher","DOI":"10.5555\/3304415.3304589"},{"key":"e_1_3_1_148_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33019291"},{"key":"e_1_3_1_149_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.186"},{"key":"e_1_3_1_150_2","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3475648"},{"key":"e_1_3_1_151_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.641"},{"key":"e_1_3_1_152_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01246"},{"key":"e_1_3_1_153_2","article-title":"Bingan: Learning compact binary descriptors with a regularized gan","volume":"31","author":"Zieba Maciej","year":"2018","unstructured":"Maciej Zieba, Piotr Semberecki, Tarek El-Gaaly, and Tomasz Trzcinski. 2018. Bingan: Learning compact binary descriptors with a regularized gan. Advances in Neural Information Processing Systems 31 (2018).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_154_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2019.00039"}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3636552","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3636552","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:10:04Z","timestamp":1750295404000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3636552"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,22]]},"references-count":153,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2024,6,30]]}},"alternative-id":["10.1145\/3636552"],"URL":"https:\/\/doi.org\/10.1145\/3636552","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"value":"0360-0300","type":"print"},{"value":"1557-7341","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,1,22]]},"assertion":[{"value":"2022-11-04","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-12-01","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-01-22","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}