{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,21]],"date-time":"2026-01-21T08:27:44Z","timestamp":1768984064857,"version":"3.49.0"},"reference-count":51,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2019,1,27]],"date-time":"2019-01-27T00:00:00Z","timestamp":1548547200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"the National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61375010"],"award-info":[{"award-number":["61375010"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003787","name":"Natural Science Foundation of Hebei Province","doi-asserted-by":"publisher","award":["F2018205102"],"award-info":[{"award-number":["F2018205102"]}],"id":[{"id":"10.13039\/501100003787","id-type":"DOI","asserted-by":"publisher"}]},{"name":"the Fundamental Research Funds for the Central Universities","award":["FRF-BD-17-002A"],"award-info":[{"award-number":["FRF-BD-17-002A"]}]},{"name":"the China Scholarship Council","award":["201706465032"],"award-info":[{"award-number":["201706465032"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>With the development of low-cost RGB-D (Red Green Blue-Depth) sensors, RGB-D object recognition has attracted more and more researchers\u2019 attention in recent years. The deep learning technique has become popular in the field of image analysis and has achieved competitive results. To make full use of the effective identification information in the RGB and depth images, we propose a multi-modal deep neural network and a DS (Dempster Shafer) evidence theory based RGB-D object recognition method. First, the RGB and depth images are preprocessed and two convolutional neural networks are trained, respectively. Next, we perform multi-modal feature learning using the proposed quadruplet samples based objective function to fine-tune the network parameters. Then, two probability classification results are obtained using two sigmoid SVMs (Support Vector Machines) with the learned RGB and depth features. Finally, the DS evidence theory based decision fusion method is used for integrating the two classification results. Compared with other RGB-D object recognition methods, our proposed method adopts two fusion strategies: Multi-modal feature learning and DS decision fusion. Both the discriminative information of each modality and the correlation information between the two modalities are exploited. Extensive experimental results have validated the effectiveness of the proposed method.<\/jats:p>","DOI":"10.3390\/s19030529","type":"journal-article","created":{"date-parts":[[2019,1,29]],"date-time":"2019-01-29T03:40:55Z","timestamp":1548733255000},"page":"529","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":18,"title":["RGB-D Object Recognition Using Multi-Modal Deep Neural Network and DS Evidence Theory"],"prefix":"10.3390","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4137-7424","authenticated-orcid":false,"given":"Hui","family":"Zeng","sequence":"first","affiliation":[{"name":"School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China"},{"name":"Beijing Engineering Research Center of Industrial Spectrum Imaging, Beijing 100083, China"}]},{"given":"Bin","family":"Yang","sequence":"additional","affiliation":[{"name":"School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China"},{"name":"Beijing Engineering Research Center of Industrial Spectrum Imaging, Beijing 100083, China"}]},{"given":"Xiuqing","family":"Wang","sequence":"additional","affiliation":[{"name":"Vocational &amp; Technical Institute, Hebei Normal University, Shijiazhuang 050024, China"}]},{"given":"Jiwei","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China"},{"name":"Beijing Engineering Research Center of Industrial Spectrum Imaging, Beijing 100083, China"}]},{"given":"Dongmei","family":"Fu","sequence":"additional","affiliation":[{"name":"School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China"},{"name":"Beijing Engineering Research Center of Industrial Spectrum Imaging, Beijing 100083, China"}]}],"member":"1968","published-online":{"date-parts":[[2019,1,27]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"4669","DOI":"10.1109\/TIP.2017.2696744","article-title":"Track Everything: Limiting Prior Knowledge in Online Multi-Object Recognition","volume":"26","author":"Wong","year":"2017","journal-title":"IEEE Trans. Image Process."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1383","DOI":"10.1109\/TPAMI.2015.2491940","article-title":"A Global Hypothesis Verification Framework for 3D Object Recognition in Clutter","volume":"38","author":"Aldoma","year":"2016","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Oliveira, F.F., Souza, A.A.F., Fernandes, M.A.C., Gomes, R.B., and Goncalves, L.M.G. (2018). Efficient 3D Objects Recognition Using Multifoveated Point Clouds. Sensors, 18.","DOI":"10.3390\/s18072302"},{"key":"ref_4","first-page":"1862","article-title":"A Feature Learning and Object Recognition Framework for Underwater Fish Images","volume":"25","author":"Chuang","year":"2016","journal-title":"IEEE Trans. Image Process."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Gandarias, J.M., G\u00f3mez-de-Gabriel, J.M., and Garc\u00eda-Cerezo, A.J. (2018). Enhancing Perception with Tactile Object Recognition in Adaptive Grippers for Human\u2013Robot Interaction. Sensors, 18.","DOI":"10.3390\/s18030692"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.patrec.2015.12.006","article-title":"A comparative study of data fusion for RGB-D based visual recognition","volume":"73","author":"Hua","year":"2016","journal-title":"Pattern Recognit. Lett."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"446","DOI":"10.1016\/j.patcog.2017.06.037","article-title":"Multi-modal uniform deep learning for RGB-D person re-identification","volume":"72","author":"Ren","year":"2017","journal-title":"Pettern Recognit."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"300","DOI":"10.1016\/j.patcog.2017.07.026","article-title":"Multi-modal deep feature learning for RGB-D object detection","volume":"72","author":"Xu","year":"2017","journal-title":"Pattern Recognit."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"280","DOI":"10.1016\/j.neucom.2015.03.017","article-title":"Subset based deep learning for RGB-D object recognition","volume":"165","author":"Bai","year":"2015","journal-title":"Neurocomputing"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"433","DOI":"10.1016\/j.patcog.2016.08.016","article-title":"Learning coupled classifiers with RGB images for RGB-D object recognition","volume":"61","author":"Li","year":"2017","journal-title":"Pattern Recognit."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1023\/B:VISI.0000029664.99615.94","article-title":"Distinctive Image Features from Scale-Invariant Keypoints","volume":"60","author":"Lowe","year":"2004","journal-title":"Int. J. Comput. Vis."},{"key":"ref_12","first-page":"404","article-title":"SURF: Speeded up Robust Features","volume":"Volume 3951","author":"Bay","year":"2006","journal-title":"Proceedings of the European Conference on Computer Vision, Lecture Notes in Computer Science"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"635","DOI":"10.1016\/S0262-8856(98)00074-2","article-title":"Surface matching for object recognition in complex three-dimensional scenes","volume":"16","author":"Johnson","year":"1998","journal-title":"Image Vis. Comput."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"433","DOI":"10.1109\/34.765655","article-title":"Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes","volume":"21","author":"Johnson","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Schwarz, M., Schulz, H., and Behnke, S. (2015, January 26\u201330). RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features. Proceedings of the IEEE International Conference on Robotics and Automation, Seattle, WA, USA.","DOI":"10.1109\/ICRA.2015.7139363"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"149","DOI":"10.1016\/j.cviu.2015.05.007","article-title":"Semi-supervised learning and feature evaluation for RGB-D object recognition","volume":"139","author":"Cheng","year":"2015","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_17","first-page":"1611","article-title":"Evidence theory and differential evolution based uncertainty quantification for buckling load of semi-rigid jointed Frames","volume":"40","author":"Tang","year":"2015","journal-title":"Acad. Sci."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1114","DOI":"10.21629\/JSEE.2017.06.09","article-title":"Temporal evidence combination method for multi-sensor targetrecognition based on DS theory and IFS","volume":"28","author":"Wang","year":"2017","journal-title":"J. Syst. Eng. Electron."},{"key":"ref_19","unstructured":"Kuang, Y., and Li, L. (2013, January 23\u201325). Speech emotion recognition of decision fusion based on DS evidence theory. Proceedings of the IEEE 4th International Conference on Software Engineering and Service Science, Beijing, China."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1247","DOI":"10.1109\/LGRS.2015.2390914","article-title":"Target Recognition via Information Aggregation through Dempster\u2013Shafer\u2019s Evidence Theory","volume":"12","author":"Dong","year":"2015","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Lai, K., Bo, L., Ren, X., and Fox, D. (2011, January 9\u201313). A large-scale hierarchical multi-view RGB-D object dataset. Proceedings of the IEEE International Conference on Robotics and Automation, Shanghai, China.","DOI":"10.1109\/ICRA.2011.5980382"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Bo, L., Ren, X., and Fox, D. (2011, January 25\u201330). Depth kernel descriptors for object recognition. Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems, San Francisco, CA, USA.","DOI":"10.1109\/IROS.2011.6095119"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1651","DOI":"10.1109\/TPAMI.2015.2491925","article-title":"Structure-Preserving Binary Representations for RGB-D Action Recognition","volume":"38","author":"Yu","year":"2016","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"558","DOI":"10.1016\/j.robot.2015.09.027","article-title":"CoSPAIR: Colored Histograms of Spatial Concentric Surflet-Pairs for 3D object recognition","volume":"75","author":"Logoglu","year":"2016","journal-title":"Robot. Auton. Syst."},{"key":"ref_25","unstructured":"Bo, L., Ren, X., and Fox, D. (2012, January 18\u201321). Unsupervised Feature Learning for RGB-D Based Object Recognition. Proceedings of the International Symposium on Experimental Robotics, Qu\u00e9bec City, QC, Canada."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Blum, M., Springenberg, J.T., W\u00fclfing, J., and Riedmiller, M. (2012, January 14\u201318). A learned feature descriptor for object recognition in RGB-D data. Proceedings of the IEEE International Conference on Robotics and Automation, St. Paul, MN, USA.","DOI":"10.1109\/ICRA.2012.6225188"},{"key":"ref_27","unstructured":"Asif, U., Bennamoun, M., and Sohel, F. (October, January 28). Discriminative feature learning for efficient RGB-D object recognition. Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems, Hamburg, Germany."},{"key":"ref_28","unstructured":"Huang, Y., Zhu, F., Shao, L., and Frangi, A.F. (2016, January 16\u201321). Color Object Recognition via Cross-Domain Learning on RGB-D Images. Proceedings of the IEEE International Conference on Robotics and Automation, Stockholm, Sweden."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Li, F., Liu, H., Xu, X., and Sun, F. (2016, January 24\u201329). Multi-Modal Local Receptive Field Extreme Learning Machine for object recognition. Proceedings of the International Joint Conference on Neural Networks, Vancouver, BC, Canada.","DOI":"10.1109\/IJCNN.2016.7727402"},{"key":"ref_30","unstructured":"Socher, R., Huval, B., Bhat, B., Manning, C.D., and Ng, A.Y. (2012, January 3\u20136). Convolutional-Recursive Deep Learning for 3D Object Classification. Proceedings of the International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1887","DOI":"10.1109\/TMM.2015.2476655","article-title":"Large-Margin Multi-Modal Deep Learning for RGB-D Object Recognition","volume":"17","author":"Wang","year":"2015","journal-title":"IEEE Trans. Multimed."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Rahman, M.M., Tan, Y., Xue, J., and Lu, K. (2017, January 10\u201314). RGB-D object recognition with multimodal deep convolutional neural networks. Proceedings of the IEEE International Conference on Multimedia and Expo, Hong Kong, China.","DOI":"10.1109\/ICME.2017.8019538"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Tang, L., Yang, Z.X., and Jia, K. (2018). Canonical Correlation Analysis Regularization: An Effective Deep Multi-View Learning Baseline for RGB-D Object Recognition. IEEE Trans. Cogn. Dev. Syst.","DOI":"10.1109\/TCDS.2018.2866587"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Zia, S., Y\u00fcksel, B., Y\u00fcret, D., and Yemez, Y. (2017, January 22\u201329). RGB-D Object Recognition Using Deep Convolutional Neural Networks. Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy.","DOI":"10.1109\/ICCVW.2017.109"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Song, S., and Xiao, J. (2013, January 1\u20138). Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines. Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia.","DOI":"10.1109\/ICCV.2013.36"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1016\/j.cviu.2016.05.011","article-title":"Occlusion aware particle filter tracker to handle complex and persistent occlusions","volume":"150","author":"Meshgi","year":"2016","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Camplani, M., Hannuna, S., Mirmehdi, M., Damen, D., Paiement, A., Tao, L., and Burghardt, T. (2015, January 7\u201310). Real-time RGB-D Tracking with Depth Scaling Kernelised Correlation Filters and Occlusion Handling. Proceedings of the British Machine Vision Conference, Swansea, UK.","DOI":"10.5244\/C.29.145"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Gupta, S., Girshick, R., Arbel\u00e1ez, P., and Malik, J. (2014, January 6\u201312). Learning Rich Features from RGB-D Images for Object Detection and Segmentation. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10584-0_23"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Gupta, S., Arbel\u00e1ez, P., and Malik, J. (2013, January 23\u201328). Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.79"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Song, S., Lichtenberg, S.P., and Xiao, J. (2015, January 7\u201312). SUN RGB-D: A RGB-D scene understanding benchmark suite. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298655"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Gupta, S., Hoffman, J., and Malik, J. (2016, January 27\u201330). Cross Modal Distillation for Supervision Transfer. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.309"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Song, S., and Xiao, J. (2016, January 27\u201330). Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.94"},{"key":"ref_43","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012, January 3\u20136). ImageNet classification with deep convolutional neural networks. Proceedings of the 25th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Chen, W., Chen, X., Zhang, J., and Huang, K. (2017, January 21\u201326). Beyond Triplet loss: A deep quadruplet network for person re-identification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.145"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Zhang, D., Zhao, L., Xu, D., and Lu, D. (2017, January 11\u201314). Learning Local Feature Descriptor with Quadruplet Ranking Loss. Proceedings of the CCF Chinese Conference on Computer Vision, Tianjin, China.","DOI":"10.1007\/978-981-10-7299-4_17"},{"key":"ref_46","unstructured":"Platt, J.C. (2000). Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods. Advances in Large Margin Classifiers, MIT Press."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Bo, L., Lai, K., Ren, X., and Fox, D. (2011, January 20\u201325). Object recognition with hierarchical kernel descriptors. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA.","DOI":"10.1109\/CVPR.2011.5995719"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23\u201328). Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Wang, A., Cai, J., Lu, J., and Cham, T.J. (2015, January 7\u201313). MMSS: Multi-modal Sharable and Specific Feature Learning for RGB-D Object Recognition. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.134"},{"key":"ref_50","unstructured":"Eitel, A., Springenberg, J.T., Spinello, L., Riedmiller, M., and Burgard, W. (October, January 28). Multimodal Deep Learning for Robust RGB-D Object Recognition. Proceedings of the IEEE\/RSJ International Conference on Intelligence Robots and Systems, Hamburg, Germany."},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Cheng, Y., Cai, R., Zhao, X., and Huang, K. (2015, January 19\u201322). Convolutional Fisher Kernels for RGB-D Object Recognition. Proceedings of the International Conference on 3D Vision, Lyon, France.","DOI":"10.1109\/3DV.2015.23"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/3\/529\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:28:59Z","timestamp":1760185739000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/3\/529"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,1,27]]},"references-count":51,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2019,2]]}},"alternative-id":["s19030529"],"URL":"https:\/\/doi.org\/10.3390\/s19030529","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,1,27]]}}}