{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,14]],"date-time":"2026-04-14T16:00:14Z","timestamp":1776182414313,"version":"3.50.1"},"reference-count":61,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2018,8,7]],"date-time":"2018-08-07T00:00:00Z","timestamp":1533600000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Due to the specific characteristics and complicated contents of remote sensing (RS) images, remote sensing image retrieval (RSIR) is always an open and tough research topic in the RS community. There are two basic blocks in RSIR, including feature learning and similarity matching. In this paper, we focus on developing an effective feature learning method for RSIR. With the help of the deep learning technique, the proposed feature learning method is designed under the bag-of-words (BOW) paradigm. Thus, we name the obtained feature deep BOW (DBOW). The learning process consists of two parts, including image descriptor learning and feature construction. First, to explore the complex contents within the RS image, we extract the image descriptor in the image patch level rather than the whole image. In addition, instead of using the handcrafted feature to describe the patches, we propose the deep convolutional auto-encoder (DCAE) model to deeply learn the discriminative descriptor for the RS image. Second, the k-means algorithm is selected to generate the codebook using the obtained deep descriptors. Then, the final histogrammic DBOW features are acquired by counting the frequency of the single code word. When we get the DBOW features from the RS images, the similarities between RS images are measured using L1-norm distance. Then, the retrieval results can be acquired according to the similarity order. The encouraging experimental results counted on four public RS image archives demonstrate that our DBOW feature is effective for the RSIR task. Compared with the existing RS image features, our DBOW can achieve improved behavior on RSIR.<\/jats:p>","DOI":"10.3390\/rs10081243","type":"journal-article","created":{"date-parts":[[2018,8,7]],"date-time":"2018-08-07T11:20:23Z","timestamp":1533640823000},"page":"1243","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":97,"title":["Unsupervised Deep Feature Learning for Remote Sensing Image Retrieval"],"prefix":"10.3390","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1375-0778","authenticated-orcid":false,"given":"Xu","family":"Tang","sequence":"first","affiliation":[{"name":"Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, International Research Center for Intelligent Perception and Computation, Joint International Research Laboratory of Intelligent Perception and Computation, School of Artificial Intelligence, Xidian University, Xi\u2019an 710071, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0379-2042","authenticated-orcid":false,"given":"Xiangrong","family":"Zhang","sequence":"additional","affiliation":[{"name":"Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, International Research Center for Intelligent Perception and Computation, Joint International Research Laboratory of Intelligent Perception and Computation, School of Artificial Intelligence, Xidian University, Xi\u2019an 710071, China"}]},{"given":"Fang","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China"}]},{"given":"Licheng","family":"Jiao","sequence":"additional","affiliation":[{"name":"Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, International Research Center for Intelligent Perception and Computation, Joint International Research Laboratory of Intelligent Perception and Computation, School of Artificial Intelligence, Xidian University, Xi\u2019an 710071, China"}]}],"member":"1968","published-online":{"date-parts":[[2018,8,7]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1016\/j.isprsjprs.2012.09.010","article-title":"A review of EO image information mining","volume":"75","author":"Quartulli","year":"2013","journal-title":"J. Photogram. Remote Sens."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"2923","DOI":"10.1109\/TGRS.2003.817197","article-title":"Information mining in remote sensing image archives: System concepts","volume":"41","author":"Datcu","year":"2003","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1145\/1348246.1348248","article-title":"Image retrieval: Ideas, influences, and trends of the new age","volume":"40","author":"Datta","year":"2008","journal-title":"ACM Comput. Surv."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"947","DOI":"10.1109\/34.955109","article-title":"SIMPLIcity: Semantics-sensitive integrated matching for picture libraries","volume":"23","author":"Wang","year":"2001","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1187","DOI":"10.1109\/TIP.2005.849770","article-title":"CLUE: Cluster-based retrieval of images by unsupervised learning","volume":"14","author":"Chen","year":"2005","journal-title":"IEEE Trans. Image Process."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"4014","DOI":"10.1109\/TCYB.2016.2591583","article-title":"Deep multimodal distance metric learning using click constraints for image ranking","volume":"47","author":"Yu","year":"2017","journal-title":"IEEE Trans. Cybern."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1343","DOI":"10.1080\/01431161.2017.1399472","article-title":"Visual descriptors for content-based retrieval of remote-sensing images","volume":"39","author":"Napoletano","year":"2018","journal-title":"Int. J. Remote Sens."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"5148","DOI":"10.1109\/TGRS.2017.2702596","article-title":"Remote sensing scene classification by unsupervised representation learning","volume":"55","author":"Lu","year":"2017","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"6805","DOI":"10.1109\/TGRS.2017.2734697","article-title":"Unsupervised-Restricted Deconvolutional Neural Network for Very High Resolution Remote-Sensing Image Classification","volume":"55","author":"Tao","year":"2017","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_10","unstructured":"Wang, Q., He, X., and Li, X. (2018). Locality and Structure Regularized Low Rank Representation for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens., 1\u201313."},{"key":"ref_11","first-page":"1","article-title":"Optimal Clustering Framework for Hyperspectral Band Selection","volume":"PP","author":"Wang","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"892","DOI":"10.1109\/TGRS.2015.2469138","article-title":"Hashing-based scalable remote sensing image search and retrieval in large archives","volume":"54","author":"Demir","year":"2016","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"610","DOI":"10.1109\/TSMC.1973.4309314","article-title":"Textural features for image classification","volume":"SMC-3","author":"Haralick","year":"1973","journal-title":"IEEE Trans. Syst. Man Cybern."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"2126","DOI":"10.1109\/TGRS.2008.918647","article-title":"Spectral clustering ensemble applied to SAR image segmentation","volume":"46","author":"Zhang","year":"2008","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_15","unstructured":"Manjunath, B.S., Salembier, P., and Sikora, T. (2002). Introduction to MPEG-7: Multimedia Content Description Interface, John Wiley & Sons."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Sivic, J., and Zisserman, A. (2003, January 13\u201316). Video Google: A text retrieval approach to object matching in videos. Proceedings of the 9th IEEE International Conference on Computer Vision, Nice, France.","DOI":"10.1109\/ICCV.2003.1238663"},{"key":"ref_17","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012, January 3\u20136). Imagenet classification with deep convolutional neural networks. Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA."},{"key":"ref_18","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1109\/TNNLS.2015.2435783","article-title":"Change detection in synthetic aperture radar images based on deep neural networks","volume":"27","author":"Gong","year":"2016","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1928","DOI":"10.1109\/LGRS.2017.2737823","article-title":"Recursive Autoencoders-Based Unsupervised Feature Learning for Hyperspectral Image Classification","volume":"14","author":"Zhang","year":"2017","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"461","DOI":"10.1109\/TGRS.2017.2750220","article-title":"Deep Multiple Instance Learning-Based Spatial-Spectral Classification for PAN and MS Imagery","volume":"56","author":"Liu","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_22","unstructured":"Wang, Q., Yuan, Z., and Li, X. (2018). GETNET: A General End-to-end Two-dimensional CNN Framework for Hyperspectral Image Change Detection. IEEE Trans. Geosci. Remote Sens."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"818","DOI":"10.1109\/TGRS.2012.2205158","article-title":"Geographic image retrieval using local invariant features","volume":"51","author":"Yang","year":"2013","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Yang, Y., and Newsam, S. (2010, January 3\u20135). Bag-of-visual-words and spatial extensions for land-use classification. Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, CA, USA.","DOI":"10.1145\/1869790.1869829"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Bai, Y., Yu, W., Xiao, T., Xu, C., Yang, K., Ma, W.Y., and Zhao, T. (2014, January 3\u20137). Bag-of-words based deep neural network for image retrieval. Proceedings of the 22nd ACM international conference on Multimedia, Orlando, FL, USA.","DOI":"10.1145\/2647868.2656402"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Leskovec, J., Rajaraman, A., and Ullman, J.D. (2014). Mining of Massive Datasets, Cambridge University Press.","DOI":"10.1017\/CBO9781139924801"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1431","DOI":"10.1109\/36.718847","article-title":"Spatial information retrieval from remote-sensing images. I. Information theoretical perspective","volume":"36","author":"Datcu","year":"1998","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"1446","DOI":"10.1109\/36.718848","article-title":"Spatial information retrieval from remote-sensing images. II. Gibbs-Markov random fields","volume":"36","author":"Schroder","year":"1998","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Daschiel, H., and Datcu, M.P. (2003, January 13). Cluster structure evaluation of dyadic k-means for mining large image archives. Proceedings of the Image and Signal Processing for Remote Sens. VIII. International Society for Optics and Photonics, Crete, Greece.","DOI":"10.1117\/12.463151"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"839","DOI":"10.1109\/TGRS.2006.890579","article-title":"GeoIRIS: Geospatial information retrieval and indexing system\u2014Content mining, semantics modeling, and complex queries","volume":"45","author":"Shyu","year":"2007","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Lowe, D.G. (1999, January 20\u201327). Object recognition from local scale-invariant features. Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece.","DOI":"10.1109\/ICCV.1999.790410"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1023\/B:VISI.0000029664.99615.94","article-title":"Distinctive image features from scale-invariant keypoints","volume":"60","author":"Lowe","year":"2004","journal-title":"Int. J. Comput. Vis."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"3023","DOI":"10.1109\/TGRS.2013.2268736","article-title":"Remote sensing image retrieval with global morphological texture descriptors","volume":"52","author":"Aptoula","year":"2014","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Pham, M.T., Mercier, G., Regniers, O., and Michel, J. (2016). Texture retrieval from VHR optical remote sensed images using the local extrema descriptor with application to vineyard parcel detection. Remote Sens., 8.","DOI":"10.3390\/rs8050368"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"775","DOI":"10.1080\/2150704X.2015.1074756","article-title":"High-resolution remote-sensing imagery retrieval using sparse features by auto-encoder","volume":"6","author":"Zhou","year":"2015","journal-title":"Remote Sens. Lett."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Li, Y., Zhang, Y., Tao, C., and Zhu, H. (2016). Content-based high-resolution remote sensing image retrieval via unsupervised feature learning and collaborative affinity metric fusion. Remote Sens., 8.","DOI":"10.3390\/rs8090709"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Zhou, W., Newsam, S., Li, C., and Shao, Z. (2017). Learning low dimensional convolutional neural networks for high-resolution remote sensing image retrieval. Remote Sens., 9.","DOI":"10.3390\/rs9050489"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"950","DOI":"10.1109\/TGRS.2017.2756911","article-title":"Large-scale remote sensing image retrieval by deep hashing neural networks","volume":"56","author":"Li","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_39","unstructured":"Datta, R., Li, J., Parulekar, A., and Wang, J.Z. (2006). Scalable remotely sensed image mining using supervised learning and content-based retrieval. Tech. Rep. CSE, 6\u201319. Available online: https:\/\/pdfs.semanticscholar.org\/4438\/012b243ad7e1741e0e111d68b0b5d3ce31f0.pdf."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"3876","DOI":"10.1109\/JSTARS.2015.2429137","article-title":"SAR images retrieval based on semantic classification and region-based similarity measure for earth observation","volume":"8","author":"Jiao","year":"2015","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"1824","DOI":"10.1109\/JSTARS.2017.2664119","article-title":"SAR image content retrieval based on fuzzy similarity and relevance feedback","volume":"10","author":"Tang","year":"2017","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"1603","DOI":"10.1109\/TGRS.2010.2088404","article-title":"Entropy-balanced bitmap tree for shape-based object retrieval from large-scale satellite imagery databases","volume":"49","author":"Scott","year":"2011","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"6020","DOI":"10.1109\/TGRS.2016.2579648","article-title":"A three-layered graph-based learning approach for remote sensing image retrieval","volume":"54","author":"Wang","year":"2016","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"242","DOI":"10.1109\/LGRS.2016.2636819","article-title":"Fusion similarity-based reranking for SAR image retrieval","volume":"14","author":"Tang","year":"2017","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"5798","DOI":"10.1109\/TGRS.2017.2714676","article-title":"Two-stage reranking for remote sensing image retrieval","volume":"55","author":"Tang","year":"2017","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Masci, J., Meier, U., Cire\u015fan, D., and Schmidhuber, J. (2011, January 14\u201317). Stacked convolutional auto-encoders for hierarchical feature extraction. Proceedings of the International Conference on Artificial Neural Networks, Espoo, Finland.","DOI":"10.1007\/978-3-642-21735-7_7"},{"key":"ref_47","unstructured":"Csurka, G., Dance, C., Fan, L., Willamowski, J., and Bray, C. (2004, January 11\u201314). Visual categorization with bags of keypoints. Proceedings of the 8th European Conference on Computer Vision, Prague, Czech Republic."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"747","DOI":"10.1109\/LGRS.2015.2513443","article-title":"Bag-of-visual-words scene classifier with local and global features for high spatial resolution remote sensing imagery","volume":"13","author":"Zhu","year":"2016","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"14988","DOI":"10.3390\/rs71114988","article-title":"A comparative study of sampling analysis in the scene classification of optical high-spatial resolution remote sensing imagery","volume":"7","author":"Hu","year":"2015","journal-title":"Remote Sens."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Veksler, O., Boykov, Y., and Mehrani, P. (2010, January 5\u201311). Superpixels and supervoxels in an energy optimization framework. Proceedings of the 11th European Conference on Computer Vision, Heraklion, Crete, Greece.","DOI":"10.1007\/978-3-642-15555-0_16"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Achanta, R., and S\u00fcsstrunk, S. (2017, January 21\u201326). Superpixels and polygons using simple non-iterative clustering. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.520"},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"2274","DOI":"10.1109\/TPAMI.2012.120","article-title":"SLIC superpixels compared to state-of-the-art superpixel methods","volume":"34","author":"Achanta","year":"2012","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Lee, H., Grosse, R., Ranganath, R., and Ng, A.Y. (2009, January 14\u201318). Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, QC, Canada.","DOI":"10.1145\/1553374.1553453"},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"2108","DOI":"10.1109\/TGRS.2015.2496185","article-title":"Dirichlet-derived multiple topic scene classification model for high spatial resolution remote sensing imagery","volume":"54","author":"Zhao","year":"2016","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Zhao, B., Zhong, Y., Zhang, L., and Huang, B. (2016). The Fisher kernel coding framework for high spatial resolution scene classification. Remote Sens., 8.","DOI":"10.3390\/rs8020157"},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"1865","DOI":"10.1109\/JPROC.2017.2675998","article-title":"Remote sensing image scene classification: Benchmark and state of the art","volume":"105","author":"Cheng","year":"2017","journal-title":"Proc. IEEE"},{"key":"ref_57","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Maaten","year":"2008","journal-title":"J. Mach. Learn. Res."},{"key":"ref_58","unstructured":"Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., and LeCun, Y. (2013). Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv."},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"1092","DOI":"10.1109\/TPAMI.2011.219","article-title":"Kernelized locality-sensitive hashing","volume":"34","author":"Kulis","year":"2012","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1109\/LGRS.2015.2499239","article-title":"Deep learning earth observation classification using ImageNet pretrained networks","volume":"13","author":"Marmanis","year":"2016","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and Li, F. (2009, January 20\u201325). Imagenet: A large-scale hierarchical image database. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206848"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/10\/8\/1243\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T15:17:11Z","timestamp":1760195831000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/10\/8\/1243"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,8,7]]},"references-count":61,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2018,8]]}},"alternative-id":["rs10081243"],"URL":"https:\/\/doi.org\/10.3390\/rs10081243","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,8,7]]}}}