{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,16]],"date-time":"2026-02-16T16:53:22Z","timestamp":1771260802920,"version":"3.50.1"},"reference-count":57,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2024,4,9]],"date-time":"2024-04-09T00:00:00Z","timestamp":1712620800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100010418","name":"Grand Information Technology Research Support Program","doi-asserted-by":"publisher","award":["IITP-2024-2020-0-01462"],"award-info":[{"award-number":["IITP-2024-2020-0-01462"]}],"id":[{"id":"10.13039\/501100010418","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100010418","name":"Grand Information Technology Research Support Program","doi-asserted-by":"publisher","award":["2021-0-02068"],"award-info":[{"award-number":["2021-0-02068"]}],"id":[{"id":"10.13039\/501100010418","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100010418","name":"Grand Information Technology Research Support Program","doi-asserted-by":"publisher","award":["K-22-L04-C05-S01"],"award-info":[{"award-number":["K-22-L04-C05-S01"]}],"id":[{"id":"10.13039\/501100010418","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Institute of Information &amp; Communications Technology Planning &amp; Evaluation (IITP)","award":["IITP-2024-2020-0-01462"],"award-info":[{"award-number":["IITP-2024-2020-0-01462"]}]},{"name":"Institute of Information &amp; Communications Technology Planning &amp; Evaluation (IITP)","award":["2021-0-02068"],"award-info":[{"award-number":["2021-0-02068"]}]},{"name":"Institute of Information &amp; Communications Technology Planning &amp; Evaluation (IITP)","award":["K-22-L04-C05-S01"],"award-info":[{"award-number":["K-22-L04-C05-S01"]}]},{"name":"Korea Institute of Science and Technology (KISTI)","award":["IITP-2024-2020-0-01462"],"award-info":[{"award-number":["IITP-2024-2020-0-01462"]}]},{"name":"Korea Institute of Science and Technology (KISTI)","award":["2021-0-02068"],"award-info":[{"award-number":["2021-0-02068"]}]},{"name":"Korea Institute of Science and Technology (KISTI)","award":["K-22-L04-C05-S01"],"award-info":[{"award-number":["K-22-L04-C05-S01"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>The objective of content-based image retrieval (CBIR) is to locate samples from a database that are akin to a query, relying on the content embedded within the images. A contemporary strategy involves calculating the similarity between compact vectors by encoding both the query and the database images as global descriptors. In this work, we propose an image retrieval method by using hierarchical K-means clustering to efficiently organize the image descriptors within the database, which aims to optimize the subsequent retrieval process. Then, we compute the similarity between the descriptor set within the leaf nodes and the query descriptor to rank them accordingly. Three tree search algorithms are presented to enable a trade-off between search accuracy and speed that allows for substantial gains at the expense of a slightly reduced retrieval accuracy. Our proposed method demonstrates enhancement in image retrieval speed when applied to the CLIP-based model, UNICOM, designed for category-level retrieval, as well as the CNN-based R-GeM model, tailored for particular object retrieval by validating its effectiveness across various domains and backbones. We achieve an 18-times speed improvement while preserving over 99% accuracy when applied to the In-Shop dataset, the largest dataset in the experiments.<\/jats:p>","DOI":"10.3390\/s24082401","type":"journal-article","created":{"date-parts":[[2024,4,10]],"date-time":"2024-04-10T03:07:48Z","timestamp":1712718468000},"page":"2401","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Efficient Image Retrieval Using Hierarchical K-Means Clustering"],"prefix":"10.3390","volume":"24","author":[{"given":"Dayoung","family":"Park","sequence":"first","affiliation":[{"name":"Department of Control and Robot Engineering, Chungbuk National University, Cheongju 28644, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3400-0493","authenticated-orcid":false,"given":"Youngbae","family":"Hwang","sequence":"additional","affiliation":[{"name":"Department of Control and Robot Engineering, Chungbuk National University, Cheongju 28644, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2024,4,9]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Gordoa, A., Rodriguez-Serrano, J.A., Perronnin, F., and Valveny, E. (2012, January 6\u201321). Leveraging category-level labels for instance-level image retrieval. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6248035"},{"key":"ref_2","unstructured":"Tolias, G., Sicre, R., and J\u00e9gou, H. (2015). Particular object retrieval with integral max-pooling of CNN activations. arXiv."},{"key":"ref_3","first-page":"1655","article-title":"Fine-tuning CNN image retrieval with no human annotation","volume":"41","author":"Tolias","year":"2018","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"7270","DOI":"10.1109\/TPAMI.2022.3218591","article-title":"Deep learning for instance retrieval: A survey","volume":"45","author":"Chen","year":"2022","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_5","unstructured":"El-Nouby, A., Neverova, N., Laptev, I., and J\u00e9gou, H. (2021). Training vision transformers for image retrieval. arXiv."},{"key":"ref_6","unstructured":"An, X., Deng, J., Yang, K., Li, J., Feng, Z., Guo, J., Yang, J., and Liu, T. (2023). Unicom: Universal and Compact Representation Learning for Image Retrieval. arXiv."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Gong, Y., Wang, L., Guo, R., and Lazebnik, S. (2014, January 6\u201312). Multi-scale orderless pooling of deep convolutional activation features. Proceedings of the Computer Vision\u2014ECCV 2014: 13th European Conference, Zurich, Switzerland. Proceedings, Part VII 13.","DOI":"10.1007\/978-3-319-10584-0_26"},{"key":"ref_8","unstructured":"Babenko, A., and Lempitsky, V. (2015, January 7\u201312). Aggregating local deep features for image retrieval. Proceedings of the IEEE International Conference on Computer Vision, Boston, MA, USA."},{"key":"ref_9","first-page":"251","article-title":"Visual instance retrieval with deep convolutional networks","volume":"4","author":"Razavian","year":"2016","journal-title":"ITE Trans. Media Technol. Appl."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Kalantidis, Y., Mellina, C., and Osindero, S. (2016, January 8\u201310). Cross-dimensional weighting for aggregated deep convolutional features. Proceedings of the Computer Vision\u2013ECCV 2016 Workshops: Amsterdam, The Netherlands. Proceedings, Part I 14.","DOI":"10.1007\/978-3-319-46604-0_48"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Babenko, A., Slesarev, A., Chigorin, A., and Lempitsky, V. (2014, January 6\u201312). Neural codes for image retrieval. Proceedings of the Computer Vision\u2014ECCV 2014: 13th European Conference, Zurich, Switzerland. Proceedings, Part I 13.","DOI":"10.1007\/978-3-319-10590-1_38"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Gordo, A., Almaz\u00e1n, J., Revaud, J., and Larlus, D. (2016, January 11\u201314). Deep image retrieval: Learning global representations for image search. Proceedings of the Computer Vision\u2014ECCV 2016: 14th European Conference, Amsterdam, The Netherlands. Proceedings, Part VI 14.","DOI":"10.1007\/978-3-319-46466-4_15"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Gkelios, S., Boutalis, Y., and Chatzichristofis, S.A. (2021, January 7\u20139). Investigating the vision transformer model for image retrieval tasks. Proceedings of the 2021 17th International Conference on Distributed Computing in Sensor Systems (DCOSS), Pafos, Cypros.","DOI":"10.1109\/DCOSS52077.2021.00065"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Tan, F., Yuan, J., and Ordonez, V. (2021, January 20\u201325). Instance-level image retrieval using reranking transformers. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Nashville, TN, USA.","DOI":"10.1109\/ICCV48922.2021.01189"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Song, C.H., Yoon, J., Choi, S., and Avrithis, Y. (2023, January 3\u20138). Boosting vision transformers for image retrieval. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA.","DOI":"10.1109\/WACV56688.2023.00019"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1016\/j.cviu.2019.04.004","article-title":"Siamese graph convolutional network for content based remote sensing image retrieval","volume":"184","author":"Chaudhuri","year":"2019","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"2164","DOI":"10.1109\/TMM.2022.3143694","article-title":"Deep graph convolutional quantization networks for image retrieval","volume":"25","author":"Wang","year":"2022","journal-title":"IEEE Trans. Multimed."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"114940","DOI":"10.1016\/j.eswa.2021.114940","article-title":"Deep convolutional features for image retrieval","volume":"177","author":"Gkelios","year":"2021","journal-title":"Expert Syst. Appl."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Lin, K., Yang, H.F., Hsiao, J.H., and Chen, C.S. (2015, January 7\u201312). Deep learning of binary hash codes for fast image retrieval. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPRW.2015.7301269"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Varga, D., and Szir\u00e1nyi, T. (2016, January 9\u201312). Fast content-based image retrieval using convolutional neural network and hash function. Proceedings of the 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Budapest, Hungary.","DOI":"10.1109\/SMC.2016.7844637"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1927469","DOI":"10.1080\/23311916.2021.1927469","article-title":"Content-based image retrieval: A review of recent trends","volume":"8","author":"Hameed","year":"2021","journal-title":"Cogent Eng."},{"key":"ref_22","unstructured":"Nister, D., and Stewenius, H. (2006, January 17\u201322). Scalable recognition with a vocabulary tree. Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR\u201906), New York, NY, USA."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Fadaei, S., Rashno, A., and Rashno, E. (2019, January 18\u201319). Content-based image retrieval speedup. Proceedings of the 2019 5th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS), Shahrood, Iran.","DOI":"10.1109\/ICSPIS48872.2019.9066132"},{"key":"ref_24","first-page":"9087","article-title":"Efficient image retrieval via decoupling diffusion into online and offline processing","volume":"33","author":"Yang","year":"2019","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1023\/B:VISI.0000029664.99615.94","article-title":"Distinctive image features from scale-invariant keypoints","volume":"60","author":"Lowe","year":"2004","journal-title":"Int. J. Comput. Vis."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Bay, H., Tuytelaars, T., and Van Gool, L. (2006, January 7\u201313). Surf: Speeded up robust features. Proceedings of the Computer Vision\u2013ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria. Proceedings, Part I 9.","DOI":"10.1007\/11744023_32"},{"key":"ref_27","unstructured":"Ledwich, L., and Williams, S. (2004, January 6\u20138). Reduced SIFT features for image retrieval and indoor localisation. Proceedings of the Australian Conference on Robotics and Automation, Canberra, Australia."},{"key":"ref_28","unstructured":"Yuan, X., Yu, J., Qin, Z., and Wan, T. (2011, January 11\u201314). A SIFT-LBP image retrieval model based on bag of features. Proceedings of the IEEE International Conference on Image Processing, Brussels, Belgium."},{"key":"ref_29","first-page":"1","article-title":"Content-based image retrieval using SURF and colour moments","volume":"11","author":"Velmurugan","year":"2011","journal-title":"Glob. J. Comput. Sci. Technol."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Bakar, S.A., Hitam, M.S., and Yussof, W.N.J.H.W. (2013, January 8\u201310). Content-based image retrieval using SIFT for binary and greyscale images. Proceedings of the 2013 IEEE International Conference on Signal and Image Processing Applications, Melaka, Malaysia.","DOI":"10.1109\/ICSIPA.2013.6707982"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Ali, N., Bajwa, K.B., Sablatnig, R., Chatzichristofis, S.A., Iqbal, Z., Rashid, M., and Habib, H.A. (2016). A novel image retrieval based on visual words integration of SIFT and SURF. PLoS ONE, 11.","DOI":"10.1371\/journal.pone.0157428"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"2725","DOI":"10.1007\/s00521-018-3677-9","article-title":"Content-based image retrieval system using ORB and SIFT features","volume":"32","author":"Chhabra","year":"2020","journal-title":"Neural Comput. Appl."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Sivic, Z. (2003, January 18\u201320). Video Google: A text retrieval approach to object matching in videos. Proceedings of the Ninth IEEE International Conference on Computer Vision, Madison, WI, USA.","DOI":"10.1109\/ICCV.2003.1238663"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"146","DOI":"10.1080\/00437956.1954.11659520","article-title":"Distributional structure","volume":"10","author":"Harris","year":"1954","journal-title":"Word"},{"key":"ref_35","unstructured":"Jun, H., Ko, B., Kim, Y., Kim, I., and Kim, J. (2019). Combination of multiple global descriptors for image retrieval. arXiv."},{"key":"ref_36","unstructured":"Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., and Clark, J. (2021, January 18\u201324). Learning transferable visual models from natural language supervision. Proceedings of the International Conference on Machine Learning, Virtual."},{"key":"ref_37","first-page":"1","article-title":"Text2light: Zero-shot text-driven hdr panorama generation","volume":"41","author":"Chen","year":"2022","journal-title":"ACM Trans. Graph. TOG"},{"key":"ref_38","unstructured":"Lin, J., and Gong, S. (2023). GridCLIP: One-Stage Object Detection by Grid-Level CLIP Representation Learning. arXiv."},{"key":"ref_39","unstructured":"Bhat, A., and Jain, S. (2023). Face Recognition in the age of CLIP & Billion image datasets. arXiv."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Rao, Y., Zhao, W., Chen, G., Tang, Y., Zhu, Z., Huang, G., Zhou, J., and Lu, J. (2022, January 18\u201324). Denseclip: Language-guided dense prediction with context-aware prompting. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01755"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Baldrati, A., Bertini, M., Uricchio, T., and Del Bimbo, A. (2022, January 18\u201324). Effective conditioned and composed image retrieval combining clip-based features. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.02080"},{"key":"ref_42","unstructured":"Ilharco, G., Wortsman, M., Wightman, R., Gordon, C., Carlini, N., Taori, R., Dave, A., Shankar, V., Namkoong, H., and Miller, J. (2024, January 12). OpenCLIP. Available online: https:\/\/zenodo.org\/records\/5143773."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Kov\u00e1cs, L. (2013, January 17\u201319). Parallel multi-tree indexing for evaluating large descriptor sets. Proceedings of the 2013 11th International Workshop on Content-Based Multimedia Indexing (CBMI), Veszprem, Hungary.","DOI":"10.1109\/CBMI.2013.6576581"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"230","DOI":"10.1145\/362003.362025","article-title":"Some approaches to best-match file searching","volume":"16","author":"Burkhard","year":"1973","journal-title":"Commun. ACM"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"761","DOI":"10.1016\/j.imavis.2004.02.006","article-title":"Robust wide-baseline stereo from maximally stable extremal regions","volume":"22","author":"Matas","year":"2004","journal-title":"Image Vis. Comput."},{"key":"ref_46","unstructured":"Murphy, W.E. (2014). Large Scale Hierarchical K-Means Based Image Retrieval With MapReduce. [Master\u2019s Thesis, Air Force Institute of Technology]."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Zhao, C.Y., Shi, B.X., Zhang, M.X., and Shang, Z.W. (2010, January 11\u201314). Image retrieval based on improved hierarchical clustering algorithm. Proceedings of the 2010 International Conference on Wavelet Analysis and Pattern Recognition, Qingdao, China.","DOI":"10.1109\/ICWAPR.2010.5576314"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Mantena, G., and Anguera, X. (2013, January 26\u201331). Speed improvements to information retrieval-based dynamic time warping using hierarchical k-means clustering. Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada.","DOI":"10.1109\/ICASSP.2013.6639327"},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1016\/j.knosys.2013.05.003","article-title":"A sample-based hierarchical adaptive K-means clustering method for large-scale video retrieval","volume":"49","author":"Liao","year":"2013","journal-title":"Knowl.-Based Syst."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Guo, X., Cao, X., Zhang, J., and Li, X. (2009, January 23\u201327). Mift: A mirror reflection invariant feature descriptor. Proceedings of the Computer Vision\u2014ACCV 2009: 9th Asian Conference on Computer Vision, Xi\u2019an, China. Revised Selected Papers, Part II 9.","DOI":"10.1007\/978-3-642-12304-7_50"},{"key":"ref_51","unstructured":"Wah, C., Branson, S., Welinder, P., Perona, P., and Belongie, S. (2011). The Caltech-UCSD Birds-200-2011 Dataset, California Institute of Technology. Technical Report CNS-TR-2011-001."},{"key":"ref_52","unstructured":"Krause, J., Deng, J., Stark, M., and Fei-Fei, L. (2024, January 12). Collecting a Large-Scale Dataset of Fine-Grained Cars. Available online: https:\/\/ai.stanford.edu\/~jkrause\/papers\/fgvc13.pdf."},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Liu, Z., Luo, P., Qiu, S., Wang, X., and Tang, X. (2016, January 27\u201330). Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.124"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Philbin, J., Chum, O., Isard, M., Sivic, J., and Zisserman, A. (2007, January 17\u201322). Object retrieval with large vocabularies and fast spatial matching. Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA.","DOI":"10.1109\/CVPR.2007.383172"},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Philbin, J., Chum, O., Isard, M., Sivic, J., and Zisserman, A. (2008, January 23\u201328). Lost in quantization: Improving particular object retrieval in large scale image databases. Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA.","DOI":"10.1109\/CVPR.2008.4587635"},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Radenovi\u0107, F., Iscen, A., Tolias, G., Avrithis, Y., and Chum, O. (2018, January 18\u201323). Revisiting oxford and paris: Large-scale image retrieval benchmarking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00598"},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/24\/8\/2401\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T14:25:20Z","timestamp":1760106320000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/24\/8\/2401"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,4,9]]},"references-count":57,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2024,4]]}},"alternative-id":["s24082401"],"URL":"https:\/\/doi.org\/10.3390\/s24082401","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,4,9]]}}}