{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,21]],"date-time":"2026-01-21T18:38:56Z","timestamp":1769020736166,"version":"3.49.0"},"reference-count":54,"publisher":"MDPI AG","issue":"23","license":[{"start":{"date-parts":[[2021,11,28]],"date-time":"2021-11-28T00:00:00Z","timestamp":1638057600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100002383","name":"King Saud University","doi-asserted-by":"publisher","award":["RSP-2021\/322"],"award-info":[{"award-number":["RSP-2021\/322"]}],"id":[{"id":"10.13039\/501100002383","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Classification of indoor environments is a challenging problem. The availability of low-cost depth sensors has opened up a new research area of using depth information in addition to color image (RGB) data for scene understanding. Transfer learning of deep convolutional networks with pairs of RGB and depth (RGB-D) images has to deal with integrating these two modalities. Single-channel depth images are often converted to three-channel images by extracting horizontal disparity, height above ground, and the angle of the pixel\u2019s local surface normal (HHA) to apply transfer learning using networks trained on the Places365 dataset. The high computational cost of HHA encoding can be a major disadvantage for the real-time prediction of scenes, although this may be less important during the training phase. We propose a new, computationally efficient encoding method that can be integrated with any convolutional neural network. We show that our encoding approach performs equally well or better in a multimodal transfer learning setup for scene classification. Our encoding is implemented in a customized and pretrained VGG16 Net. We address the class imbalance problem seen in the image dataset using a method based on the synthetic minority oversampling technique (SMOTE) at the feature level. With appropriate image augmentation and fine-tuning, our network achieves scene classification accuracy comparable to that of other state-of-the-art architectures.<\/jats:p>","DOI":"10.3390\/s21237950","type":"journal-article","created":{"date-parts":[[2021,12,1]],"date-time":"2021-12-01T01:45:02Z","timestamp":1638323102000},"page":"7950","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Convolution-Based Encoding of Depth Images for Transfer Learning in RGB-D Scene Classification"],"prefix":"10.3390","volume":"21","author":[{"given":"Radhakrishnan","family":"Gopalapillai","sequence":"first","affiliation":[{"name":"Department of Computer Science & Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Bengaluru 560035, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Deepa","family":"Gupta","sequence":"additional","affiliation":[{"name":"Department of Computer Science & Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Bengaluru 560035, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mohammed","family":"Zakariah","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, P.O. Box 57168, Riyadh 11543, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0998-8978","authenticated-orcid":false,"given":"Yousef Ajami","family":"Alotaibi","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, P.O. Box 57168, Riyadh 11543, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2021,11,28]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1007\/s10846-011-9608-y","article-title":"Johnny: An Autonomous Service Robot for Domestic Environments","volume":"66","author":"Breuer","year":"2012","journal-title":"J. Intell. Robot. Syst."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1186\/s40638-017-0061-7","article-title":"Assessment of personal care and medical robots from older adults\u2019 perspective","volume":"4","author":"Goher","year":"2017","journal-title":"Robot. Biomim."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1016\/j.procs.2020.04.025","article-title":"Object Boundary Identification using Two-phase Incremental Clustering","volume":"171","author":"Gopalapillai","year":"2020","journal-title":"Procedia Comput. Sci."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1007\/978-3-319-01778-5_46","article-title":"Experimentation and Analysis of Time Series Data for Rescue Robotics","volume":"Volume 235","author":"Thampi","year":"2014","journal-title":"Recent Advances in Intelligent Informatics"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Song, S., Lichtenberg, S.P., and Xiao, J. (2015, January 7\u201312). SUN RGB-D: A RGB-D scene understanding benchmark suite. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298655"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1016\/j.eswa.2017.02.040","article-title":"On robot indoor scene classification based on descriptor quality and efficiency","volume":"79","year":"2017","journal-title":"Expert. Syst. Appl."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1016\/j.procs.2017.09.077","article-title":"Pattern identification of robotic environments using machine learning techniques","volume":"115","author":"Gopalapillai","year":"2017","journal-title":"Procedia Comput. Sci."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1613\/jair.953","article-title":"SMOTE: Synthetic Minority Over-Sampling Technique","volume":"16","author":"Chawla","year":"2002","journal-title":"J. Artif. Intell. Res."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"108","DOI":"10.1109\/JPROC.1997.554212","article-title":"Sensor fusion for mobile robot navigation","volume":"85","author":"Kam","year":"1997","journal-title":"Proc. IEEE"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Mimouna, A., Alouani, I., Ben Khalifa, A., El Hillali, Y., Taleb-Ahmed, A., Menhaj, A., Ouahabi, A., and Ben Amara, N.E. (2020). OLIMP: A Heterogeneous Multimodal Dataset for Advanced Environment Perception. Electronics, 9.","DOI":"10.3390\/electronics9040560"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Radhakrishnan, G., Gupta, D., Abhishek, R., Ajith, A., and Tsb, S. (2012, January 27\u201329). Analysis of multimodal time series data of robotic environment. Proceedings of the 12th International Conference on Intelligent Systems Design and Applications (ISDA), Kochi, India.","DOI":"10.1109\/ISDA.2012.6416628"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"De Silva, V., Roche, J., and Kondoz, A. (2018). Robust fusion of LiDAR and wide-angle camera data for autonomous mobile robots. Sensors, 18.","DOI":"10.3390\/s18082730"},{"key":"ref_13","first-page":"3967","article-title":"Robotic sensor data analysis using stream data mining techniques","volume":"7","author":"Gopalapillai","year":"2018","journal-title":"Int. J. Eng. Technol."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TRO.2015.2496823","article-title":"Visual Place Recognition: A Survey","volume":"32","author":"Lowry","year":"2016","journal-title":"IEEE Trans. Robot."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Lowe, D.G. (1999, January 20\u201325). Object Recognition from Local Scale-Invariant Features. Proceedings of the Seventh IEEE International Conference on Computer Vision, Corfu, Greece.","DOI":"10.1109\/ICCV.1999.790410"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"433","DOI":"10.1109\/34.765655","article-title":"Using spin images for efficient object recognition in cluttered 3D scenes","volume":"21","author":"Johnson","year":"1999","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_17","unstructured":"Dalal, N., and Triggs, B. (2005, January 20\u201326). Histograms of oriented gradients for human detection. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR\u201905), San Diego, CA, USA."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"404","DOI":"10.1007\/11744023_32","article-title":"SURF: Speeded Up Robust Features","volume":"Volume 3951","author":"Leonardis","year":"2006","journal-title":"Computer Vision\u2014ECCV 2006"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1023\/A:1011139631724","article-title":"Modeling the shape of the scene: A holistic representation of the spatial envelop","volume":"42","author":"Oliva","year":"2001","journal-title":"Int. J. Comput. Vis."},{"key":"ref_20","first-page":"1489","article-title":"CENTRIST: A Visual Descriptor for Scene Categorization","volume":"33","author":"Wu","year":"2009","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"107205","DOI":"10.1016\/j.patcog.2020.107205","article-title":"Scene recognition: A comprehensive survey","volume":"102","author":"Xie","year":"2020","journal-title":"Pattern Recognit."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"823","DOI":"10.1080\/01431160600746456","article-title":"A survey of image classification methods and techniques for improving classification performance","volume":"28","author":"Lu","year":"2007","journal-title":"Int. J. Remote Sens."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1186\/s10033-021-00598-9","article-title":"ML-ANet: A Transfer Learning Approach Using Adaptation Network for Multi-label Image Classification in Autonomous Driving","volume":"34","author":"Li","year":"2021","journal-title":"Chin. J. Mech. Eng."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"106617","DOI":"10.1016\/j.knosys.2020.106617","article-title":"A deep learning based image enhancement approach for autonomous driving at night","volume":"213","author":"Li","year":"2021","journal-title":"Knowl. Based Syst."},{"key":"ref_25","first-page":"1097","article-title":"Imagenet classification with deep convolutional neural networks","volume":"25","author":"Krizhevsky","year":"2012","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_26","unstructured":"Simonyan, K., and Zisserman, A. (2015, January 7\u20139). Very deep convolutional networks for large-scale image recognition. Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015, January 7\u201312). Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_29","first-page":"487","article-title":"Learning deep features for scene recognition using places database","volume":"27","author":"Zhou","year":"2014","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1016\/j.eswa.2016.10.038","article-title":"Growing random forest on deep convolutional neural networks for scene categorization","volume":"71","author":"Bai","year":"2017","journal-title":"Expert Syst. Appl."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Damodaran, N., Sowmya, V., Govind, D., and Soman, K.P. (2019). Single-plane scene classification using deep convolution features. Soft Computing and Signal Processing, Springer.","DOI":"10.1007\/978-981-13-3600-3_71"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Silberman, N., Hoiem, D., Kohli, P., and Fergus, R. (2012). Indoor segmentation and support inference from RGBD images. Computer Vision\u2014ECCV 2012, Springer.","DOI":"10.1007\/978-3-642-33715-4_54"},{"key":"ref_33","unstructured":"Eitel, A.J., Springenberg, T., Spinello, L., Riedmiller, M., and Burgard, W. (October, January 28). Multimodal deep learning for robust RGB-D object recognition. Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"705","DOI":"10.1177\/0278364914549607","article-title":"Deep learning for detecting robotic grasps","volume":"34","author":"Lenz","year":"2015","journal-title":"Int. J. Robot. Res."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Gupta, S., Girshick, R., Arbel\u00e1ez, P., and Malik, J. (2014). Learning rich features from RGB-D images for object detection and segmentation. Computer Vision\u2014ECCV 2014, Springer.","DOI":"10.1007\/978-3-319-10584-0_23"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Zhu, H., Weibel, J., and Lu, S. (2016, January 27\u201330). Discriminative multi-modal feature fusion for RGBD indoor scene recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.324"},{"key":"ref_37","unstructured":"Liao, Y., Kodagoda, S., Wang, Y., Shi, L., and Liu, Y. (2016, January 16\u201321). Understand scene categories by objects: A semantic regularized scene classifier using Convolutional Neural Networks. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), New York, NY, USA."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Li, Y., Zhang, J., Cheng, Y., Huang, K., and Tan, T. (2018, January 2\u20137). DF2Net: Discriminative feature learning and fusion network for rgb-d indoor scene classification. Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA.","DOI":"10.1609\/aaai.v32i1.12292"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"980","DOI":"10.1109\/TIP.2018.2872629","article-title":"Learning effective RGB-D representations for scene recognition","volume":"28","author":"Song","year":"2019","journal-title":"IEEE Trans. Image Process."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"106739","DOI":"10.1109\/ACCESS.2019.2932080","article-title":"RGB-D Scene recognition via spatial-related multi-modal feature learning","volume":"7","author":"Xiong","year":"2019","journal-title":"IEEE Access"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"2722","DOI":"10.1109\/TIP.2021.3053459","article-title":"ASK: Adaptively selecting key local features for RGB-D scene recognition","volume":"30","author":"Xiong","year":"2021","journal-title":"IEEE Trans. Image Process."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"4499","DOI":"10.1007\/s11042-019-7684-3","article-title":"A survey on indoor RGB-D semantic segmentation: From hand-crafted features to deep convolutional neural networks","volume":"79","author":"Fooladgar","year":"2020","journal-title":"Multimed. Tools Appl."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Du, D., Wang, L., Wang, H., Zhao, K., and Wu, G. (2019, January 15\u201320). Translate-to-recognize networks for RGB-D scene recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2019, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01211"},{"key":"ref_44","unstructured":"Ayub, A., and Wagner, A.R. (2020, January 7\u201310). Centroid Based Concept Learning for RGB-D Indoor Scene Classification. Proceedings of the British Machine Vision Conference (BMVC), Virtual Event, UK."},{"key":"ref_45","unstructured":"Yuan, Y., Xiong, Z., and Wang, Q. (February, January 27). ACM: Adaptive Cross-Modal Graph Convolutional Neural Networks for RGB-D Scene Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"1859","DOI":"10.1109\/ACCESS.2018.2886133","article-title":"Indoor scene understanding in 2.5\/3D for autonomous agents: A survey","volume":"7","author":"Naseer","year":"2018","journal-title":"IEEE Access"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1016\/j.neunet.2018.07.011","article-title":"A systematic study of the class imbalance problem in convolutional neural networks","volume":"106","author":"Buda","year":"2018","journal-title":"Neural Netw."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1016\/j.patrec.2021.07.017","article-title":"Imbalanced image classification with complement cross entropy","volume":"151","author":"Kim","year":"2021","journal-title":"Pattern Recognit. Lett."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Ren, Y., Zhang, X., Ma, Y., Yang, Q., Wang, C., Liu, H., and Qi, Q. (2020). Full Convolutional Neural Network Based on Multi-Scale Feature Fusion for the Class Imbalance Remote Sensing Image Classification. Remote Sens., 12.","DOI":"10.3390\/rs12213547"},{"key":"ref_50","unstructured":"Wong, S.C., Gatt, A., Stamatescu, V., and McDonnell, M.D. (December, January 30). Understanding data augmentation for classification: When to warp?. Proceedings of the International Conference on Digital Image Computing: Techniques and Applications (DICTA), Gold Coast, Australia."},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Janoch, A., Karayev, S., Jia, Y., Barron, J.T., Fritz, M., Saenko, K., and Darrell, T. (2011, January 6\u201313). A category-level 3-d object dataset: Putting the kinect to work. Proceedings of the ICCV Workshop on Consumer Depth Cameras for Computer Vision, Barcelona, Spain.","DOI":"10.1109\/ICCVW.2011.6130382"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Xiao, J., Owens, A., and Torralba, A. (2013, January 1\u20138). SUN3D: A database of big spaces reconstructed using SfM and object labels. Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia.","DOI":"10.1109\/ICCV.2013.458"},{"key":"ref_53","first-page":"1","article-title":"Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning","volume":"18","author":"Nogueira","year":"2017","journal-title":"J. Mach. Learn. Res."},{"key":"ref_54","first-page":"2825","article-title":"Scikit-learn: Machine Learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/23\/7950\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:37:05Z","timestamp":1760168225000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/23\/7950"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,11,28]]},"references-count":54,"journal-issue":{"issue":"23","published-online":{"date-parts":[[2021,12]]}},"alternative-id":["s21237950"],"URL":"https:\/\/doi.org\/10.3390\/s21237950","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,11,28]]}}}