{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T01:48:55Z","timestamp":1760233735196,"version":"build-2065373602"},"reference-count":42,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2021,2,17]],"date-time":"2021-02-17T00:00:00Z","timestamp":1613520000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Scene understanding of remote sensing images is of great significance in various applications. Its fundamental problem is how to construct representative features. Various convolutional neural network architectures have been proposed for automatically learning features from images. However, is the current way of configuring the same architecture to learn all the data while ignoring the differences between images the right one? It seems to be contrary to our intuition: it is clear that some images are easier to recognize, and some are harder to recognize. This problem is the gap between the characteristics of the images and the learning features corresponding to specific network structures. Unfortunately, the literature so far lacks an analysis of the two. In this paper, we explore this problem from three aspects: we first build a visual-based evaluation pipeline of scene complexity to characterize the intrinsic differences between images; then, we analyze the relationship between semantic concepts and feature representations, i.e., the scalability and hierarchy of features which the essential elements in CNNs of different architectures, for remote sensing scenes of different complexity; thirdly, we introduce CAM, a visualization method that explains feature learning within neural networks, to analyze the relationship between scenes with different complexity and semantic feature representations. The experimental results show that a complex scene would need deeper and multi-scale features, whereas a simpler scene would need lower and single-scale features. Besides, the complex scene concept is more dependent on the joint semantic representation of multiple objects. Furthermore, we propose the framework of scene complexity prediction for an image and utilize it to design a depth and scale-adaptive model. It achieves higher performance but with fewer parameters than the original model, demonstrating the potential significance of scene complexity.<\/jats:p>","DOI":"10.3390\/rs13040742","type":"journal-article","created":{"date-parts":[[2021,2,17]],"date-time":"2021-02-17T21:35:42Z","timestamp":1613597742000},"page":"742","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Scene Complexity: A New Perspective on Understanding the Scene Semantics of Remote Sensing and Designing Image-Adaptive Convolutional Neural Networks"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1820-4015","authenticated-orcid":false,"given":"Jian","family":"Peng","sequence":"first","affiliation":[{"name":"School of Geosciences and Info-Physics, Central South University, Changsha 410083, China"}]},{"given":"Xiaoming","family":"Mei","sequence":"additional","affiliation":[{"name":"School of Geosciences and Info-Physics, Central South University, Changsha 410083, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0464-6955","authenticated-orcid":false,"given":"Wenbo","family":"Li","sequence":"additional","affiliation":[{"name":"Institute of Technology Innovation, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230088, China"}]},{"given":"Liang","family":"Hong","sequence":"additional","affiliation":[{"name":"School of Tourism and Geography, Yunnan Normal University, Kunming 650500, China"},{"name":"GIS Technology Research Center of Resource and Environment in Western China of Ministry of Education, Kunming 650500, China"},{"name":"School of Information Science and Technology, Yunnan Normal University, Kunming 650500, China"}]},{"given":"Bingyu","family":"Sun","sequence":"additional","affiliation":[{"name":"Institute of Technology Innovation, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230088, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1173-6593","authenticated-orcid":false,"given":"Haifeng","family":"Li","sequence":"additional","affiliation":[{"name":"School of Geosciences and Info-Physics, Central South University, Changsha 410083, China"}]}],"member":"1968","published-online":{"date-parts":[[2021,2,17]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1016\/j.isprsjprs.2017.07.010","article-title":"On support relations and semantic scene graphs","volume":"131","author":"Yang","year":"2017","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1012","DOI":"10.1109\/TPAMI.2013.185","article-title":"3d traffic scene understanding from movable platforms","volume":"36","author":"Geiger","year":"2013","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Baek, J., Chelu, I.V., Iordache, L., Paunescu, V., Ryu, H., Ghiuta, A., Petreanu, A., Soh, Y., Leica, A., and Jeon, B. (2018, January 18\u201322). Scene understanding networks for autonomous driving based on around view monitoring system. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPRW.2018.00142"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster r-cnn: Towards real-time object detection with region proposal networks","volume":"39","author":"Ren","year":"2016","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_6","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, MIT Press."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Shen, L., Lin, Z., and Huang, Q. (2016). Relay backpropagation for effective learning of deep convolutional neural networks. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-46478-7_29"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1016\/j.isprsjprs.2017.05.002","article-title":"Simultaneous extraction of roads and buildings in remote sensing imagery with convolutional neural networks","volume":"130","author":"Alshehhi","year":"2017","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1016\/j.isprsjprs.2016.03.014","article-title":"A survey on object detection in optical remote sensing images","volume":"117","author":"Cheng","year":"2016","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"295","DOI":"10.1016\/S0031-3203(96)00068-4","article-title":"Object detection using Gabor filters","volume":"30","author":"Jain","year":"1997","journal-title":"Pattern Recognit."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1023\/A:1011139631724","article-title":"Modeling the shape of the scene: A holistic representation of the spatial envelope","volume":"42","author":"Oliva","year":"2001","journal-title":"Int. J. Comput. Vis."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1023\/B:VISI.0000029664.99615.94","article-title":"Distinctive image features from scale-invariant keypoints","volume":"60","author":"Lowe","year":"2004","journal-title":"Int. J. Comput. Vis."},{"key":"ref_13","unstructured":"Dalal, N., and Triggs, B. (2005, January 20\u201326). Histograms of oriented gradients for human detection. Proceedings of the 2005 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"5455","DOI":"10.1007\/s10462-020-09825-6","article-title":"A survey of the recent architectures of deep convolutional neural networks","volume":"53","author":"Khan","year":"2020","journal-title":"Artif. Intell. Rev."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"14680","DOI":"10.3390\/rs71114680","article-title":"Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery","volume":"7","author":"Hu","year":"2015","journal-title":"Remote Sens."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"2321","DOI":"10.1109\/LGRS.2015.2475299","article-title":"Deep learning based feature selection for remote sensing scene classification","volume":"12","author":"Zou","year":"2015","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Chen, C., Gong, W., Chen, Y., and Li, W. (2019). Object Detection in Remote Sensing Images Based on a Scene-Contextual Feature Pyramid Network. Remote Sens., 11.","DOI":"10.3390\/rs11030339"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Qu, H., Zhang, L., Wu, X., He, X., Hu, X., and Wen, X. (2019). Multiscale Object Detection in Infrared Streetscape Images Based on Deep Learning and Instance Level Data Augmentation. Appl. Sci., 9.","DOI":"10.3390\/app9030565"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Liu, H., Li, J., He, L., and Wang, Y. (2019). Superpixel-Guided Layer-Wise Embedding CNN for Remote Sensing Image Classification. Remote Sens., 11.","DOI":"10.3390\/rs11020174"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Xu, Y., Wu, L., Xie, Z., and Chen, Z. (2018). Building extraction in very high resolution remote sensing imagery using deep learning and guided filters. Remote Sens., 10.","DOI":"10.3390\/rs10010144"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Egli, S., and H\u00f6pke, M. (2020). CNN-Based Tree Species Classification Using High Resolution RGB Image Data from Automated UAV Observations. Remote Sens., 12.","DOI":"10.3390\/rs12233892"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Taoufiq, S., Nagy, B., and Benedek, C. (2020). HierarchyNet: Hierarchical CNN-Based Urban Building Classification. Remote Sens., 12.","DOI":"10.3390\/rs12223794"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"2494","DOI":"10.1109\/TGRS.2018.2873966","article-title":"Scene classification using hierarchical Wasserstein CNN","volume":"57","author":"Liu","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1111\/1467-8659.00331","article-title":"An Information Theory Framework for the Analysis of Scene Complexity","volume":"18","author":"Feixas","year":"2010","journal-title":"Comput. Graph. Forum"},{"key":"ref_25","unstructured":"Moosmann, F., Larlus, D., and Jurie, F. (2006, January 7\u201313). Learning saliency maps for object categorization. Proceedings of the Eccv\u201906 Workshop on the Representation & Use of Prior Knowledge in Vision, Graz, Austria."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Tian, M., Wan, S., and Yue, L. (2007). A Novel Approach for Change Detection in Remote Sensing Image Based on Saliency Map. Computer Graphics, Imaging and Visualisation, IEEE.","DOI":"10.1109\/CGIV.2007.11"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1469","DOI":"10.1109\/TPAMI.2013.200","article-title":"What Makes a Photograph Memorable?","volume":"36","author":"Isola","year":"2013","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Ionescu, R.T., Alexe, B., Leordeanu, M., Popescu, M., Papadopoulos, D.P., and Ferrari, V. (2016, January 27\u201330). How Hard Can It Be? Estimating the Difficulty of Visual Search in an Image. Proceedings of the 2016 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.237"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Ayromlou, M., Zillich, M., Ponweiser, W., and Vincze, M. (2003). Measuring scene complexity to adapt feature selection of model-based object tracking. International Conference on Computer Vision Systems, Springer.","DOI":"10.1007\/3-540-36592-3_43"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"3965","DOI":"10.1109\/TGRS.2017.2685945","article-title":"AID: A benchmark data set for performance evaluation of aerial scene classification","volume":"55","author":"Xia","year":"2017","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Zeiler, M.D., and Fergus, R. (2014). Visualizing and understanding convolutional networks. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-10590-1_53"},{"key":"ref_32","unstructured":"Simonyan, K., Vedaldi, A., and Zisserman, A. (2013). Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., and Torralba, A. (2016, January 27\u201330). Learning deep features for discriminative localization. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.319"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., and Torralba, A. (2010, January 13\u201318). Sun database: Large-scale scene recognition from abbey to zoo. Proceedings of the 2010 IEEE conference on Computer vision and pattern recognition (CVPR), San Francisco, CA, USA.","DOI":"10.1109\/CVPR.2010.5539970"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015, January 8\u201310). Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_36","unstructured":"Simonyan, K., and Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T. (2014, January 3\u20137). Caffe: Convolutional architecture for fast feature embedding. Proceedings of the 22nd ACM international conference on Multimedia, Orlando, FL, USA.","DOI":"10.1145\/2647868.2654889"},{"key":"ref_38","unstructured":"Liu, R., Lehman, J., Molino, P., Such, F.P., Frank, E., Sergeev, A., and Yosinski, J. (2018, January 3\u20138). An intriguing failing of convolutional neural networks and the coordconv solution. Proceedings of the Advances in Neural Information Processing Systems, Montr\u00e9al, QC, Canada."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_40","unstructured":"Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., and Torralba, A. (2014). Object Detectors Emerge in Deep Scene CNNs. arXiv."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"276","DOI":"10.11613\/BM.2012.031","article-title":"Interrater reliability: The kappa statistic","volume":"22","author":"McHugh","year":"2012","journal-title":"Biochem. Medica"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Li, H., Dou, X., Tao, C., Wu, Z., Chen, J., Peng, J., Deng, M., and Zhao, L. (2020). RSI-CB: A Large-Scale Remote Sensing Image Classification Benchmark Using Crowdsourced Data. Sensors, 20.","DOI":"10.3390\/s20061594"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/4\/742\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:25:20Z","timestamp":1760160320000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/4\/742"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,2,17]]},"references-count":42,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2021,2]]}},"alternative-id":["rs13040742"],"URL":"https:\/\/doi.org\/10.3390\/rs13040742","relation":{},"ISSN":["2072-4292"],"issn-type":[{"type":"electronic","value":"2072-4292"}],"subject":[],"published":{"date-parts":[[2021,2,17]]}}}