{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T04:33:54Z","timestamp":1777696434821,"version":"3.51.4"},"reference-count":24,"publisher":"SAGE Publications","issue":"3","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IDA"],"published-print":{"date-parts":[[2023,5,18]]},"abstract":"<jats:p>Learning the similarity between fashion items is essential for many fashion-related tasks. Most methods based on global or local image similarity cannot meet the fine-grained retrieval requirements related to attributes. We are the first to clearly distinguish the concepts of attribute name and their values and divide fashion retrieval tasks that combine images and text into: attribute-guided retrieval and attribute-manipulated retrieval. We propose a hierarchical attribute-aware embedding network (HAEN) that takes images and attributes as input, learns multiple attribute-specific embedding spaces, and measures fine-grained similarity in the corresponding spaces. It can accurately map different attributes to the corresponding areas of the image, thereby facilitating the feature fusion of two different modalities of text and image, including enhancement and replacement. Then on this basis, we propose three attribute-manipulated similarity learning methods, HAEN_Avg, HAEN_Rec, and HAEN_Cmb. With comprehensive validation on two real-world fashion datasets, we demonstrate that our methods can effectively leverage semantic knowledge to improve image retrieval performance, including attribute-guided and attribute-manipulated retrieval tasks.<\/jats:p>","DOI":"10.3233\/ida-226740","type":"journal-article","created":{"date-parts":[[2023,5,12]],"date-time":"2023-05-12T13:37:09Z","timestamp":1683898629000},"page":"733-751","source":"Crossref","is-referenced-by-count":1,"title":["Attribute-guided and attribute-manipulated similarity learning network for fashion image retrieval"],"prefix":"10.1177","volume":"27","author":[{"given":"Yongquan","family":"Wan","sequence":"first","affiliation":[{"name":"School of Computer Engineering and Science, Shanghai University, Shanghai, China"},{"name":"School of Information Technology, Shanghai Jian Qiao University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Cairong","family":"Yan","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Donghua University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Guobing","family":"Zou","sequence":"additional","affiliation":[{"name":"School of Computer Engineering and Science, Shanghai University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bofeng","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Computer and Information Engineering, Shanghai Polytechnic University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","reference":[{"issue":"4","key":"10.3233\/IDA-226740_ref1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3447239","article-title":"Fashion meets computer vision: A survey","volume":"54","author":"Cheng","year":"2021","journal-title":"ACM Computing Surveys"},{"key":"10.3233\/IDA-226740_ref2","doi-asserted-by":"crossref","unstructured":"J. Huang, R.S. Feris, Q. Chen and S. Yan, Cross-domain image retrieval with a dual attribute-aware ranking network, in: Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1062\u20131070.","DOI":"10.1109\/ICCV.2015.127"},{"key":"10.3233\/IDA-226740_ref3","doi-asserted-by":"crossref","unstructured":"Z. Kuang, Y. Gao, G. Li, P. Luo, Y. Chen, L. Lin and W. Zhang, Fashion retrieval via graph reasoning networks on a similarity pyramid, in: Proceedings of the IEEE\/CVF International Conference on Computer Vision, 2019, pp. 3066\u20133075.","DOI":"10.1109\/ICCV.2019.00316"},{"key":"10.3233\/IDA-226740_ref4","doi-asserted-by":"crossref","unstructured":"H. Wen, X. Song, X. Yang, Y. Zhan and L. Nie, Comprehensive linguistic-visual composition network for image retrieval, in: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021, pp. 1369\u20131378.","DOI":"10.1145\/3404835.3462967"},{"key":"10.3233\/IDA-226740_ref5","doi-asserted-by":"crossref","unstructured":"S. Jandial, P. Badjatiya, P. Chawla, A. Chopra, M. Sarkar and B. Krishnamurthy, Sac: Semantic attention composition for text-conditioned image retrieval, in: Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, 2022, pp. 4021\u20134030.","DOI":"10.1109\/WACV51458.2022.00067"},{"key":"10.3233\/IDA-226740_ref6","doi-asserted-by":"crossref","unstructured":"Y. Yang, M. Wang, W. Zhou and H. Li, Cross-modal joint prediction and alignment for composed query image retrieval, in: Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 3303\u20133311.","DOI":"10.1145\/3474085.3475483"},{"key":"10.3233\/IDA-226740_ref7","doi-asserted-by":"crossref","unstructured":"N. Vo, L. Jiang, C. Sun, K. Murphy, L.-J. Li, L. Fei-Fei and J. Hays, Composing text and image for image retrieval-an empirical odyssey, in: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp.\u00a06439\u20136448.","DOI":"10.1109\/CVPR.2019.00660"},{"key":"10.3233\/IDA-226740_ref8","doi-asserted-by":"crossref","unstructured":"Z. Liu, P. Luo, S. Qiu, X. Wang and X. Tang, Deepfashion: Powering robust clothes recognition and retrieval with rich annotations, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1096\u20131104.","DOI":"10.1109\/CVPR.2016.124"},{"key":"10.3233\/IDA-226740_ref9","doi-asserted-by":"crossref","unstructured":"X. Ji, W. Wang, M. Zhang and Y. Yang, Cross-domain image retrieval with attention modeling, in: Proceedings of the 25th ACM International Conference on Multimedia, 2017, pp. 1654\u20131662.","DOI":"10.1145\/3123266.3123429"},{"key":"10.3233\/IDA-226740_ref10","doi-asserted-by":"crossref","unstructured":"Z. Wang, Y. Gu, Y. Zhang, J. Zhou and X. Gu, Clothing retrieval with visual attention model, in: 2017 IEEE Visual Communications and Image Processing (VCIP), IEEE, 2017, pp. 1\u20134.","DOI":"10.1109\/VCIP.2017.8305144"},{"key":"10.3233\/IDA-226740_ref11","doi-asserted-by":"crossref","unstructured":"X. Han, Z. Wu, Y.-G. Jiang and L.S. Davis, Learning fashion compatibility with bidirectional lstms, in: Proceedings of the 25th ACM International Conference on Multimedia, 2017, pp. 1078\u20131086.","DOI":"10.1145\/3123266.3123394"},{"key":"10.3233\/IDA-226740_ref12","doi-asserted-by":"crossref","unstructured":"A. Veit, S. Belongie and T. Karaletsos, Conditional similarity networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 830\u2013838.","DOI":"10.1109\/CVPR.2017.193"},{"key":"10.3233\/IDA-226740_ref13","doi-asserted-by":"crossref","unstructured":"L. Liao, X. He, B. Zhao, C.-W. Ngo and T.-S. Chua, Interpretable multimodal retrieval for fashion products, in: Proceedings of the 26th ACM International Conference on Multimedia, 2018, pp. 1571\u20131579.","DOI":"10.1145\/3240508.3240646"},{"key":"10.3233\/IDA-226740_ref14","doi-asserted-by":"crossref","unstructured":"Z. Ma, J. Dong, Z. Long, Y. Zhang, Y. He, H. Xue and S. Ji, Fine-grained fashion similarity learning by attribute-specific embedding network, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 2020, pp. 11741\u201311748.","DOI":"10.1609\/aaai.v34i07.6845"},{"key":"10.3233\/IDA-226740_ref15","doi-asserted-by":"crossref","unstructured":"C. Yan, A. Ding, Y. Zhang and Z. Wang, Learning fashion similarity based on hierarchical attribute embedding, in: 2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA), IEEE, 2021, pp. 1\u20138.","DOI":"10.1109\/DSAA53316.2021.9564236"},{"key":"10.3233\/IDA-226740_ref16","doi-asserted-by":"crossref","unstructured":"Y. Chen, S. Gong and L. Bazzani, Image search with text feedback by visiolinguistic attention learning, in: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 3001\u20133011.","DOI":"10.1109\/CVPR42600.2020.00307"},{"key":"10.3233\/IDA-226740_ref17","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1016\/j.patrec.2020.12.001","article-title":"Clothes image caption generation with attribute detection and visual attention model","volume":"141","author":"Li","year":"2021","journal-title":"Pattern Recognition Letters"},{"key":"10.3233\/IDA-226740_ref18","doi-asserted-by":"crossref","unstructured":"X. Wei, T. Zhang, Y. Li, Y. Zhang and F. Wu, Multi-modality cross attention network for image and sentence matching, in: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 10941\u201310950.","DOI":"10.1109\/CVPR42600.2020.01095"},{"key":"10.3233\/IDA-226740_ref19","doi-asserted-by":"crossref","unstructured":"K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770\u2013778.","DOI":"10.1109\/CVPR.2016.90"},{"key":"10.3233\/IDA-226740_ref20","doi-asserted-by":"crossref","unstructured":"M.I. Vasileva, B.A. Plummer, K. Dusad, S. Rajpal, R. Kumar and D. Forsyth, Learning type-aware embeddings for fashion compatibility, in: Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 390\u2013405.","DOI":"10.1007\/978-3-030-01270-0_24"},{"key":"10.3233\/IDA-226740_ref21","doi-asserted-by":"crossref","unstructured":"J. Dong, X. Li, C. Xu, S. Ji, Y. He, G. Yang and X. Wang, Dual encoding for zero-example video retrieval, in: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9346\u20139355.","DOI":"10.1109\/CVPR.2019.00957"},{"key":"10.3233\/IDA-226740_ref22","doi-asserted-by":"crossref","unstructured":"X. Zou, X. Kong, W. Wong, C. Wang, Y. Liu and Y. Cao, Fashionai: A hierarchical dataset for fashion understanding, in: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 296\u2013304.","DOI":"10.1109\/CVPRW.2019.00039"},{"key":"10.3233\/IDA-226740_ref23","doi-asserted-by":"crossref","unstructured":"X. Song, F. Feng, J. Liu, Z. Li, L. Nie and J. Ma, Neurostylist: Neural compatibility modeling for clothing matching, in: Proceedings of the 25th ACM International Conference on Multimedia, 2017, pp. 753\u2013761.","DOI":"10.1145\/3123266.3123314"},{"key":"10.3233\/IDA-226740_ref24","doi-asserted-by":"crossref","unstructured":"X. Han, Z. Wu, P.X. Huang, X. Zhang, M. Zhu, Y. Li, Y. Zhao and L.S. Davis, Automatic spatially-aware fashion concept discovery, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 1463\u20131471.","DOI":"10.1109\/ICCV.2017.163"}],"container-title":["Intelligent Data Analysis"],"original-title":[],"link":[{"URL":"https:\/\/content.iospress.com\/download?id=10.3233\/IDA-226740","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T09:20:04Z","timestamp":1777454404000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/full\/10.3233\/IDA-226740"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,18]]},"references-count":24,"journal-issue":{"issue":"3"},"URL":"https:\/\/doi.org\/10.3233\/ida-226740","relation":{},"ISSN":["1088-467X","1571-4128"],"issn-type":[{"value":"1088-467X","type":"print"},{"value":"1571-4128","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,5,18]]}}}