{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,14]],"date-time":"2025-10-14T00:33:49Z","timestamp":1760402029661,"version":"build-2065373602"},"reference-count":54,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2020,1,8]],"date-time":"2020-01-08T00:00:00Z","timestamp":1578441600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100005073","name":"Agency for Defense Development","doi-asserted-by":"publisher","award":["UC160016FD"],"award-info":[{"award-number":["UC160016FD"]}],"id":[{"id":"10.13039\/501100005073","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Remote sensing image retrieval (RSIR) is the process of searching for identical areas by investigating the similarities between a query image and the database images. RSIR is a challenging task owing to the time difference, viewpoint, and coverage area depending on the shooting circumstance, resulting in variations in the image contents. In this paper, we propose a novel method based on a coarse-to-fine strategy, which makes a deep network more robust to the variations in remote sensing images. Moreover, we propose a new triangular loss function to consider the whole relation within the tuple. This loss function improves the retrieval performance and demonstrates better performance in terms of learning the detailed information in complex remote sensing images. To verify our methods, we experimented with the Google Earth South Korea dataset, which contains 40,000 images, using the evaluation metric Recall@n. In all experiments, we obtained better performance results than those of the existing retrieval training methods. Our source code and Google Earth South Korea dataset are available online.<\/jats:p>","DOI":"10.3390\/rs12020219","type":"journal-article","created":{"date-parts":[[2020,1,9]],"date-time":"2020-01-09T03:07:11Z","timestamp":1578539231000},"page":"219","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":14,"title":["Coarse-to-Fine Deep Metric Learning for Remote Sensing Image Retrieval"],"prefix":"10.3390","volume":"12","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4320-7792","authenticated-orcid":false,"given":"Min-Sub","family":"Yun","sequence":"first","affiliation":[{"name":"Department of Brain and Cognitive Engineering, Korea University, Anam-dong, Seongbuk-gu, Seoul 02841, Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6548-4486","authenticated-orcid":false,"given":"Woo-Jeoung","family":"Nam","sequence":"additional","affiliation":[{"name":"Department of Computer and Radio Communication Engineering, Korea University, Anam-dong, Seongbuk-gu, Seoul 02841, Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6249-4996","authenticated-orcid":false,"given":"Seong-Whan","family":"Lee","sequence":"additional","affiliation":[{"name":"Department of Brain and Cognitive Engineering, Korea University, Anam-dong, Seongbuk-gu, Seoul 02841, Korea"},{"name":"Department of Computer and Radio Communication Engineering, Korea University, Anam-dong, Seongbuk-gu, Seoul 02841, Korea"},{"name":"Department of Artificial Intelligence, Korea University, Anam-dong, Seongbuk-gu, Seoul 02841, Korea"}]}],"member":"1968","published-online":{"date-parts":[[2020,1,8]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"2811","DOI":"10.1109\/TGRS.2017.2783902","article-title":"When deep learning meets metric learning: Remote sensing image scene classification via learning discriminative CNNs","volume":"56","author":"Cheng","year":"2018","journal-title":"IEEE Trans. Geosci. Remote. Sens."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Mahdianpari, M., Salehi, B., Rezaee, M., Mohammadimanesh, F., and Zhang, Y. (2018). Very deep convolutional neural networks for complex land cover mapping using multispectral remote sensing imagery. Remote. Sens., 10.","DOI":"10.3390\/rs10071119"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1109\/MGRS.2017.2762307","article-title":"Deep learning in remote sensing: A comprehensive review and list of resources","volume":"5","author":"Zhu","year":"2017","journal-title":"IEEE Geosci. Remote. Sens. Mag."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Xiong, W., Lv, Y., Cui, Y., Zhang, X., and Gu, X. (2019). A discriminative feature learning approach for remote sensing image retrieval. Remote. Sens., 11.","DOI":"10.3390\/rs11030281"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Cao, R., Zhang, Q., Zhu, J., Li, Q., Li, Q., Liu, B., and Qiu, G. (2019). Enhancing remote sensing image retrieval with triplet deep metric learning network. arXiv.","DOI":"10.1080\/2150704X.2019.1647368"},{"key":"ref_6","unstructured":"Zhou, W., Deng, X., and Shao, Z. (2018). Region convolutional features for multi-label remote sensing image retrieval. arXiv."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Zhou, W., Newsam, S., Li, C., and Shao, Z. (2017). Learning low dimensional convolutional neural networks for high-resolution remote sensing image retrieval. Remote. Sens., 9.","DOI":"10.3390\/rs9050489"},{"key":"ref_8","unstructured":"Yue-Hei Ng, J., Yang, F., and Davis, L.S. (2015, January 7\u201313). Exploiting local features from deep networks for image retrieval. Proceedings of the IEEE Conference on Computer vision and Pattern Recognition Workshops, Santiago, Chile."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Sattler, T., Weyand, T., Leibe, B., and Kobbelt, L. (2012, January 3\u20137). Image retrieval for image-based localization revisited. Proceedings of the British Machine Vision Conference, Surrey, UK.","DOI":"10.5244\/C.26.76"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Wan, J., Wang, D., Hoi, S.C.H., Wu, P., Zhu, J., Zhang, Y., and Li, J. (2014, January 3\u20137). Deep learning for content-based image retrieval: A comprehensive study. Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, FL, USA.","DOI":"10.1145\/2647868.2654948"},{"key":"ref_11","unstructured":"Maeng, H., Liao, S., Kang, D., Lee, S.-W., and Jain, A.K. (2012, January 5\u20139). Nighttime face recognition at long distance: Cross-distance and cross-spectral matching. Proceedings of the 11th Asian Conference on Computer Vision, Daejeon, Korea."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"2373","DOI":"10.1016\/j.patcog.2005.01.015","article-title":"Tracking non-rigid objects using probabilistic Hausdorff distance matching","volume":"38","author":"Park","year":"2005","journal-title":"Pattern Recognit."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"767","DOI":"10.1016\/j.patcog.2003.07.012","article-title":"Qualitative estimation of camera motion parameters from the linear composition of optical flow","volume":"37","author":"Park","year":"2004","journal-title":"Pattern Recognit."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Roh, H.-K., and Lee, S.-W. (2000, January 15\u201317). Multiple people tracking using an appearance model based on temporal color. Proceedings of the International Workshop on Biologically Motivated Computer Vision, Seoul, Korea.","DOI":"10.1007\/3-540-45482-9_37"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1016\/S0893-6080(00)00086-1","article-title":"A synthesis procedure for associative memories based on space-varying cellular neural networks","volume":"14","author":"Park","year":"2001","journal-title":"Neural Netw."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"931","DOI":"10.1016\/j.patcog.2006.06.014","article-title":"Accurate object contour tracking based on boundary edge selection","volume":"40","author":"Roh","year":"2007","journal-title":"Pattern Recognit."},{"key":"ref_17","unstructured":"Xi, D., Podolak, I.T., and Lee, S.-W. (2003, January 9\u201311). Facial component extraction and face recognition with support vector machines. Proceedings of the Fifth IEEE International Conference on Automatic Face Gesture Recognition, Guildford, UK."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Suk, H.-I., Sin, B.-K., and Lee, S.-W. (2008, January 17\u201319). Recognizing hand gestures using dynamic bayesian network. Proceedings of the 2008 8th IEEE International Conference on Automatic Face & Gesture Recognit, Amsterdam, The Netherlands.","DOI":"10.1109\/AFGR.2008.4813342"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"639","DOI":"10.1016\/j.patrec.2009.11.017","article-title":"View-independent human action recognition with volume motion template on single stereo camera","volume":"31","author":"Roh","year":"2010","journal-title":"Pattern Recognit. Lett."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1665","DOI":"10.1109\/TIFS.2013.2261061","article-title":"Face tracking and recognition at a distance: A coaxial and concentric PTZ camera system","volume":"8","author":"Park","year":"2013","journal-title":"IEEE Trans. Inf. Forensics Secur."},{"key":"ref_21","unstructured":"Jung, H.-C., Hwang, B.-W., and Lee, S.-W. (2004, January 15\u201317). Authenticating corrupted face image based on noise model. Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition, Hong Kong, China."},{"key":"ref_22","unstructured":"Hwang, B.-W., Blanz, V., Vetter, T., and Lee, S.-W. (2000, January 15\u201317). Face reconstruction from a small number of feature points. Proceedings of the 15th International Conference on Pattern Recognition, Seoul, Korea."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"329","DOI":"10.1016\/0893-6080(95)00022-4","article-title":"LVQ combined with simulated annealing for optimal design of large-set reference models","volume":"9","author":"Song","year":"1996","journal-title":"Neural Netw."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"932","DOI":"10.1109\/TCSVT.2011.2133570","article-title":"A network of dynamic probabilistic models for human interaction analysis","volume":"21","author":"Suk","year":"2011","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_25","unstructured":"Thrun, S., and Zlot, R. (2004, January 6\u20138). Reduced sift features for image retrieval and indoor localization. Proceedings of the Australian Conference on Robotics and Automation, Canberra, Australia."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1023\/B:VISI.0000029664.99615.94","article-title":"Distinctive image features from scale-invariant keypoints","volume":"60","author":"Lowe","year":"2004","journal-title":"Int. J. Comput. Vis."},{"key":"ref_27","unstructured":"Pass, G., and Zabih, R. (1996, January 2\u20134). Histogram refinement for content-based image retrieval. Proceedings of the Third IEEE Workshop on Applications of Computer Vision, Sarasota, FL, USA."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"3023","DOI":"10.1109\/TGRS.2013.2268736","article-title":"Remote sensing image retrieval with global morphological texture descriptors","volume":"52","author":"Aptoula","year":"2013","journal-title":"IEEE Trans. Geosci. Remote. Sens."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 21\u201326). Mask r-cnn. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_30","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (July, January 26). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_31","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015, January 7\u201312). Faster r-cnn: Towards real-time object detection with region proposal networks. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Son, J., Baek, M., Cho, M., and Han, B. (2017, January 21\u201326). Multi-object tracking with quadruplet convolutional neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.403"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1224","DOI":"10.1109\/TPAMI.2017.2709749","article-title":"SIFT meets CNN: A decade survey of instance retrieval","volume":"40","author":"Zheng","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_34","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Yang, Y., and Newsam, S. (2010, January 2\u20135). Bag-of-visual-words and spatial extensions for land-use classification. Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, CA, USA.","DOI":"10.1145\/1869790.1869829"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"3965","DOI":"10.1109\/TGRS.2017.2685945","article-title":"AID: A benchmark data set for performance evaluation of aerial scene classification","volume":"55","author":"Xia","year":"2017","journal-title":"IEEE Trans. Geosci. Remote. Sens."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Sun, B., Chen, C., Zhu, Y., and Jiang, J. (2019). GeoCapsNet: Aerial to Ground view Image Geo-localization using Capsule Network. arXiv.","DOI":"10.1109\/ICME.2019.00133"},{"key":"ref_38","unstructured":"Chopra, S., Hadsell, R., and LeCun, Y. (2005, January 20\u201326). Learning a similarity metric discriminatively, with application to face verification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA."},{"key":"ref_39","unstructured":"Sun, Y., Chen, Y., Wang, X., and Tang, X. (2014, January 8\u201313). Deep learning face representation by joint identification-verification. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_40","unstructured":"Kumar, B., Carneiro, G., and Reid, I. (July, January 26). Learning local image descriptors with deep siamese and triplet convolutional networks by minimising global loss functions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Schroff, F., Kalenichenko, D., and Philbin, J. (2015, January 7\u201313). Facenet: A unified embedding for face recognition and clustering. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Santiago, Chile.","DOI":"10.1109\/CVPR.2015.7298682"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Hoffer, E., and Ailon, N. (2015, January 12\u201314). Deep metric learning using triplet network. Proceedings of the International Workshop on Similarity-Based Pattern Recognition, Copenhagen, Denmark.","DOI":"10.1007\/978-3-319-24261-3_7"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Kim, S., Seo, M., Laptev, I., Cho, M., and Kwak, S. (2019, January 16\u201320). Deep Metric Learning Beyond Binary Supervision. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00239"},{"key":"ref_44","unstructured":"Babenko, A., and Lempitsky, V. (2015, January 7\u201313). Aggregating local deep features for image retrieval. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Santiago, Chile."},{"key":"ref_45","unstructured":"Tolias, G., Sicre, R., and J\u00e9gou, H. (2015). Particular object retrieval with integral max-pooling of CNN activations. arXiv."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Chen, W., Chen, X., Zhang, J., and Huang, K. (2017, January 21\u201326). Beyond triplet loss: a deep quadruplet network for person re-identification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.145"},{"key":"ref_47","unstructured":"Sohn, K. (2016, January 5\u201310). Improved deep metric learning with multi-class n-pair loss objective. Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain."},{"key":"ref_48","first-page":"2305","article-title":"Cross-convolutional-layer pooling for image recognition","volume":"39","author":"Liu","year":"2016","journal-title":"IEEE Trans. Geosci. Remote. Sens."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. (2009, January 20\u201325). Imagenet: A large-scale hierarchical image database. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1016\/j.rse.2017.06.031","article-title":"Google Earth Engine: Planetary-scale geospatial analysis for everyone","volume":"202","author":"Gorelick","year":"2017","journal-title":"Remote. Sens. Environ."},{"key":"ref_51","unstructured":"Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., and Sivic, J. (July, January 26). NetVLAD: CNN architecture for weakly supervised place recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Torii, A., Sivic, J., Pajdla, T., and Okutomi, M. (2013, January 25\u201327). Visual place recognition with repetitive structures. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.119"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Gronat, P., Obozinski, G., Sivic, J., and Pajdla, T. (2013, January 23\u201328). Learning and calibrating per-location classifiers for visual place recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.122"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Fan, C., Lee, J., Xu, M., Kumar Singh, K., Jae Lee, Y., Crandall, D.J., and Ryoo, M.S. (2017, January 21\u201326). Identifying first-person camera wearers in third-person videos. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.503"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/12\/2\/219\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,13]],"date-time":"2025-10-13T13:19:14Z","timestamp":1760361554000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/12\/2\/219"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,1,8]]},"references-count":54,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2020,1]]}},"alternative-id":["rs12020219"],"URL":"https:\/\/doi.org\/10.3390\/rs12020219","relation":{},"ISSN":["2072-4292"],"issn-type":[{"type":"electronic","value":"2072-4292"}],"subject":[],"published":{"date-parts":[[2020,1,8]]}}}