{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,6]],"date-time":"2026-05-06T15:46:09Z","timestamp":1778082369911,"version":"3.51.4"},"reference-count":86,"publisher":"MDPI AG","issue":"14","license":[{"start":{"date-parts":[[2020,7,8]],"date-time":"2020-07-08T00:00:00Z","timestamp":1594166400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61703195"],"award-info":[{"award-number":["61703195"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003392","name":"Natural Science Foundation of Fujian Province","doi-asserted-by":"publisher","award":["2019J01756"],"award-info":[{"award-number":["2019J01756"]}],"id":[{"id":"10.13039\/501100003392","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003410","name":"Department of Education, Fujian Province","doi-asserted-by":"publisher","award":["The Distinguished Young Scholars Program of Fujian Universities"],"award-info":[{"award-number":["The Distinguished Young Scholars Program of Fujian Universities"]}],"id":[{"id":"10.13039\/501100003410","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Fuzhou Technology Planning Program","award":["2018-G-96, 2018-G-98"],"award-info":[{"award-number":["2018-G-96, 2018-G-98"]}]},{"DOI":"10.13039\/501100009696","name":"Minjiang University","doi-asserted-by":"publisher","award":["MJY19021, MJY19022"],"award-info":[{"award-number":["MJY19021, MJY19022"]}],"id":[{"id":"10.13039\/501100009696","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>We address the problem of localizing waste objects from a color image and an optional depth image, which is a key perception component for robotic interaction with such objects. Specifically, our method integrates the intensity and depth information at multiple levels of spatial granularity. Firstly, a scene-level deep network produces an initial coarse segmentation, based on which we select a few potential object regions to zoom in and perform fine segmentation. The results of the above steps are further integrated into a densely connected conditional random field that learns to respect the appearance, depth, and spatial affinities with pixel-level accuracy. In addition, we create a new RGBD waste object segmentation dataset, MJU-Waste, that is made public to facilitate future research in this area. The efficacy of our method is validated on both MJU-Waste and the Trash Annotation in Context (TACO) dataset.<\/jats:p>","DOI":"10.3390\/s20143816","type":"journal-article","created":{"date-parts":[[2020,7,8]],"date-time":"2020-07-08T11:47:46Z","timestamp":1594208866000},"page":"3816","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":70,"title":["A Multi-Level Approach to Waste Object Segmentation"],"prefix":"10.3390","volume":"20","author":[{"given":"Tao","family":"Wang","sequence":"first","affiliation":[{"name":"Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, Minjiang University, Fuzhou 350108, China"},{"name":"College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350108, China"},{"name":"NetDragon Inc., Fuzhou 350001, China"}]},{"given":"Yuanzheng","family":"Cai","sequence":"additional","affiliation":[{"name":"Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, Minjiang University, Fuzhou 350108, China"},{"name":"College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350108, China"}]},{"given":"Lingyu","family":"Liang","sequence":"additional","affiliation":[{"name":"School of Electronic and Information Engineering, South China University of Technology, Guangzhou 510641, China"}]},{"given":"Dongyi","family":"Ye","sequence":"additional","affiliation":[{"name":"College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350108, China"}]}],"member":"1968","published-online":{"date-parts":[[2020,7,8]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"225","DOI":"10.3758\/BF03211502","article-title":"Visual attention within and around the field of focal attention: A zoom lens model","volume":"40","author":"Eriksen","year":"1986","journal-title":"Percept. Psychophys."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1068\/p160089","article-title":"Spatial frequency and selective attention to local and global information","volume":"16","author":"Shulman","year":"1987","journal-title":"Perception"},{"key":"ref_3","unstructured":"Pashler, H.E. (1999). The Psychology of Attention, MIT Press."},{"key":"ref_4","unstructured":"Palmer, S.E. (1999). Vision Science: Photons to Phenomenology, MIT Press."},{"key":"ref_5","unstructured":"Proen\u00e7a, P.F., and Sim\u00f5es, P. (2020). TACO: Trash Annotations in Context for Litter Detection. arXiv."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"2189","DOI":"10.1109\/TPAMI.2012.28","article-title":"Measuring the objectness of image windows","volume":"34","author":"Alexe","year":"2012","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"154","DOI":"10.1007\/s11263-013-0620-5","article-title":"Selective search for object recognition","volume":"104","author":"Uijlings","year":"2013","journal-title":"Int. J. Comput. Vis."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Zitnick, C.L., and Doll\u00e1r, P. (2014). Edge boxes: Locating object proposals from edges. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-10602-1_26"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Cheng, M.M., Zhang, Z., Lin, W.Y., and Torr, P. (2014, January 23\u201328). BING: Binarized normed gradients for objectness estimation at 300fps. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.414"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Wang, Y., Liu, J., Li, Y., Yan, J., and Lu, H. (2016, January 15\u201319). Objectness-aware semantic segmentation. Proceedings of the 24th ACM international conference on Multimedia, Amsterdam, The Netherlands.","DOI":"10.1145\/2964284.2967232"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Xia, F., Wang, P., Chen, L.C., and Yuille, A.L. (2016). Zoom better to see clearer: Human and object parsing with hierarchical auto-zoom net. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-46454-1_39"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Alexe, B., Deselaers, T., and Ferrari, V. (2010). Classcut for unsupervised class segmentation. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-642-15555-0_28"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Vezhnevets, A., Ferrari, V., and Buhmann, J.M. (2011, January 6\u201313). Weakly supervised semantic segmentation with a multi-image model. Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain.","DOI":"10.1109\/ICCV.2011.6126299"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_15","unstructured":"Zhang, H., Wu, C., Zhang, Z., Zhu, Y., Zhang, Z., Lin, H., Sun, Y., He, T., Mueller, J., and Manmatha, R. (2020). ResNeSt: Split-Attention Networks. arXiv."},{"key":"ref_16","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems, NeurIPS Foundation."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully convolutional networks for semantic segmentation. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_18","unstructured":"Koller, D., and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques, MIT Press."},{"key":"ref_19","unstructured":"Kr\u00e4henb\u00fchl, P., and Koltun, V. (2011). Efficient inference in fully connected crfs with gaussian edge potentials. Advances in Neural Information Processing Systems, NeurIPS Foundation."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1318","DOI":"10.1109\/TCYB.2013.2265378","article-title":"Enhanced computer vision with microsoft kinect sensor: A review","volume":"43","author":"Han","year":"2013","journal-title":"IEEE Trans. Cybern."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017, January 21\u201326). Pyramid scene parsing network. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.660"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., and Liu, W. (November, January 27). Ccnet: Criss-cross attention for semantic segmentation. Proceedings of the 2019 International Conference on Computer Vision, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00069"},{"key":"ref_23","unstructured":"Chen, L.C., Papandreou, G., Schroff, F., and Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv."},{"key":"ref_24","unstructured":"Yang, M., and Thung, G. (2016). Classification of Trash for Recyclability Status, Stanford University. CS229 Project Report."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Bircano\u011flu, C., Atay, M., Be\u015fer, F., Gen\u00e7, \u00d6., and K\u0131zrak, M.A. (2018, January 3\u20135). RecycleNet: Intelligent waste sorting using deep neural networks. Proceedings of the 2018 Innovations in Intelligent Systems and Applications (INISTA), Thessaloniki, Greece.","DOI":"10.1109\/INISTA.2018.8466276"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Aral, R.A., Keskin, \u015e.R., Kaya, M., and Hac\u0131\u00f6mero\u011flu, M. (2018, January 10\u201313). Classification of trashnet dataset based on deep learning models. Proceedings of the 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA.","DOI":"10.1109\/BigData.2018.8622212"},{"key":"ref_27","unstructured":"Awe, O., Mengistu, R., and Sreedhar, V. (2017). Smart Trash Net: Waste Localization and Classification, Stanford University. CS229 Project Report."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"5060857","DOI":"10.1155\/2018\/5060857","article-title":"Multilayer hybrid deep-learning method for waste classification and recycling","volume":"2018","author":"Chu","year":"2018","journal-title":"Comput. Intell. Neurosci."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"178631","DOI":"10.1109\/ACCESS.2019.2959033","article-title":"A Novel Framework for Trash Classification Using Deep Transfer Learning","volume":"7","author":"Vo","year":"2019","journal-title":"IEEE Access"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Ramalingam, B., Lakshmanan, A.K., Ilyas, M., Le, A.V., and Elara, M.R. (2018). Cascaded machine-learning technique for debris classification in floor-cleaning robot application. Appl. Sci., 8.","DOI":"10.3390\/app8122649"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Yin, J., Apuroop, K.G.S., Tamilselvam, Y.K., Mohan, R.E., Ramalingam, B., and Le, A.V. (2020). Table Cleaning Task by Human Support Robot Using Deep Learning Technique. Sensors, 20.","DOI":"10.3390\/s20061698"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Rad, M.S., von Kaenel, A., Droux, A., Tieche, F., Ouerhani, N., Ekenel, H.K., and Thiran, J.P. (2017). A computer vision system to localize and classify wastes on the streets. International Conference on Computer Vision Systems, Springer.","DOI":"10.1007\/978-3-319-68345-4_18"},{"key":"ref_33","unstructured":"Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., and LeCun, Y. (2013). Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"01056","DOI":"10.1051\/matecconf\/201823201056","article-title":"Autonomous garbage detection for intelligent urban management","volume":"232","author":"Wang","year":"2018","journal-title":"MATEC Web Conf."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"382","DOI":"10.1109\/TCE.2018.2859629","article-title":"Deep learning based robot for automatically picking up garbage on the grass","volume":"64","author":"Bai","year":"2018","journal-title":"IEEE Trans. Consum. Electron."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"2481","DOI":"10.1109\/TPAMI.2016.2644615","article-title":"Segnet: A deep convolutional encoder-decoder architecture for image segmentation","volume":"39","author":"Badrinarayanan","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Deepa, T., and Roka, S. (2017, January 17\u201319). Estimation of garbage coverage area in water terrain. Proceedings of the 2017 International Conference On Smart Technologies For Smart Nation (SmartTechCon), Bengaluru, India.","DOI":"10.1109\/SmartTechCon.2017.8358394"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Mittal, G., Yagnik, K.B., Garg, M., and Krishnan, N.C. (2016, January 12\u201316). Spotgarbage: Smartphone app to detect garbage using deep learning. Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing, Heidelberg, Germany.","DOI":"10.1145\/2971648.2971731"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"104514","DOI":"10.1109\/ACCESS.2019.2932117","article-title":"Multi-scale CNN based garbage detection of airborne hyperspectral data","volume":"7","author":"Zeng","year":"2019","journal-title":"IEEE Access"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Qiu, Y., Chen, J., Guo, J., Zhang, J., Liu, S., and Chen, S. (2017). Three dimensional object segmentation based on spatial adaptive projection for solid waste. CCF Chinese Conference on Computer Vision, Springer.","DOI":"10.1007\/978-981-10-7305-2_39"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Wang, C., Liu, S., Zhang, J., Feng, Y., and Chen, S. (2017). RGB-D Based Object Segmentation in Severe Color Degraded Environment. CCF Chinese Conference on Computer Vision, Springer.","DOI":"10.1007\/978-981-10-7305-2_40"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Grard, M., Br\u00e9gier, R., Sella, F., Dellandr\u00e9a, E., and Chen, L. (2019). Object segmentation in depth maps with one user click and a synthetically trained fully convolutional network. Human Friendly Robotics, Springer.","DOI":"10.1007\/978-3-319-89327-3_16"},{"key":"ref_43","unstructured":"He, X., Zemel, R.S., and Carreira-Perpi\u00f1\u00e1n, M.\u00c1. (July, January 27). Multiscale conditional random fields for image labeling. Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, USA."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Shotton, J., Winn, J., Rother, C., and Criminisi, A. (2006). Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. European Conference on Computer Vision, Springer.","DOI":"10.1007\/11744023_1"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Ladick\u1ef3, L., Russell, C., Kohli, P., and Torr, P.H. (October, January 29). Associative hierarchical crfs for object class image segmentation. Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan.","DOI":"10.1109\/ICCV.2009.5459248"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Gould, S., Fulton, R., and Koller, D. (October, January 27). Decomposing a scene into geometric and semantically consistent regions. Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan.","DOI":"10.1109\/ICCV.2009.5459211"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Kumar, M.P., and Koller, D. (2010, January 13\u201318). Efficiently selecting regions for scene understanding. Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA.","DOI":"10.1109\/CVPR.2010.5540072"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Munoz, D., Bagnell, J.A., and Hebert, M. (2010). Stacked hierarchical labeling. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-642-15567-3_5"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Tighe, J., and Lazebnik, S. (2010). Superparsing: Scalable nonparametric image parsing with superpixels. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-642-15555-0_26"},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"2368","DOI":"10.1109\/TPAMI.2011.131","article-title":"Nonparametric scene parsing via label transfer","volume":"33","author":"Liu","year":"2011","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_51","unstructured":"Lempitsky, V., Vedaldi, A., and Zisserman, A. (2011). Pylon model for semantic segmentation. Advances in Neural Information Processing Systems, NeuraIPS Foundation."},{"key":"ref_52","unstructured":"Liu, W., Rabinovich, A., and Berg, A.C. (2015). Parsenet: Looking wider to see better. arXiv."},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Mostajabi, M., Yadollahpour, P., and Shakhnarovich, G. (2015, January 7\u201312). Feedforward semantic segmentation with zoom-out features. Proceedings of the 28th IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298959"},{"key":"ref_54","unstructured":"Yu, F., and Koltun, V. (2015). Multi-scale context aggregation by dilated convolutions. arXiv."},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Lin, G., Shen, C., Van Den Hengel, A., and Reid, I. (July, January 26). Efficient piecewise training of deep structured models for semantic segmentation. Proceedings of the 2016 Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.348"},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Zhang, H., Dana, K., Shi, J., Zhang, Z., Wang, X., Tyagi, A., and Agrawal, A. (2018, January 18\u201323). Context encoding for semantic segmentation. Proceedings of the 2018 Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00747"},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Ding, H., Jiang, X., Shuai, B., Qun Liu, A., and Wang, G. (2018, January 18\u201323). Context contrasted feature and gated multi-scale aggregation for scene segmentation. Proceedings of the 2018 Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00254"},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Lin, D., Ji, Y., Lischinski, D., Cohen-Or, D., and Huang, H. (2018, January 8\u201314). Multi-scale context intertwining for semantic segmentation. Proceedings of the 15th European Conference on Computer Vision, Munich, Germany.","DOI":"10.1007\/978-3-030-01219-9_37"},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., and Torr, P.H. (2015, January 11\u201318). Conditional random fields as recurrent neural networks. Proceedings of the 2015 International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.179"},{"key":"ref_60","unstructured":"Schwing, A.G., and Urtasun, R. (2015). Fully connected deep structured networks. arXiv."},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Liu, Z., Li, X., Luo, P., Loy, C.C., and Tang, X. (2015, January 11\u201318). Semantic image segmentation via deep parsing network. Proceedings of the 2015 International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.162"},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Jampani, V., Kiefel, M., and Gehler, P.V. (July, January 26). Learning sparse high dimensional filters: Image filtering, dense crfs and bilateral neural networks. Proceedings of the 2016 Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.482"},{"key":"ref_63","doi-asserted-by":"crossref","unstructured":"Chandra, S., and Kokkinos, I. (2016). Fast, exact and multi-scale inference for semantic image segmentation with deep gaussian crfs. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-46478-7_25"},{"key":"ref_64","doi-asserted-by":"crossref","unstructured":"Pohlen, T., Hermans, A., Mathias, M., and Leibe, B. (2017, January 21\u201326). Full-resolution residual networks for semantic segmentation in street scenes. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.353"},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"Gadde, R., Jampani, V., Kiefel, M., Kappler, D., and Gehler, P.V. (2016). Superpixel convolutional networks using bilateral inceptions. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-46448-0_36"},{"key":"ref_66","doi-asserted-by":"crossref","unstructured":"Li, X., Jie, Z., Wang, W., Liu, C., Yang, J., Shen, X., Lin, Z., Chen, Q., Yan, S., and Feng, J. (2017, January 22\u201329). Foveanet: Perspective-aware urban scene parsing. Proceedings of the 2017 International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.91"},{"key":"ref_67","doi-asserted-by":"crossref","unstructured":"Noh, H., Hong, S., and Han, B. (2015, January 11\u201318). Learning deconvolution network for semantic segmentation. Proceedings of the 2015 International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.178"},{"key":"ref_68","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. International Conference on Medical Image Computing And Computer-Assisted Intervention, Springer.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_69","unstructured":"Yuan, Y., and Wang, J. (2018). Ocnet: Object context network for scene parsing. arXiv."},{"key":"ref_70","doi-asserted-by":"crossref","unstructured":"Zhao, H., Zhang, Y., Liu, S., Shi, J., Change Loy, C., Lin, D., and Jia, J. (2018, January 8\u201314). Psanet: Point-wise spatial attention network for scene parsing. Proceedings of the 15th European Conference on Computer Vision, Munich, Germany.","DOI":"10.1007\/978-3-030-01240-3_17"},{"key":"ref_71","doi-asserted-by":"crossref","unstructured":"Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., and Lu, H. (2019, January 15\u201320). Dual attention network for scene segmentation. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00326"},{"key":"ref_72","doi-asserted-by":"crossref","unstructured":"Lin, G., Milan, A., Shen, C., and Reid, I. (2017, January 21\u201326). Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.549"},{"key":"ref_73","doi-asserted-by":"crossref","unstructured":"Takikawa, T., Acuna, D., Jampani, V., and Fidler, S. (November, January 27). Gated-scnn: Gated shape cnns for semantic segmentation. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00533"},{"key":"ref_74","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","article-title":"Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs","volume":"40","author":"Chen","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_75","doi-asserted-by":"crossref","unstructured":"Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (2018, January 8\u201314). Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the 15th European Conference on Computer Vision, Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"ref_76","doi-asserted-by":"crossref","unstructured":"Gupta, S., Girshick, R., Arbel\u00e1ez, P., and Malik, J. (2014). Learning rich features from RGB-D images for object detection and segmentation. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-10584-0_23"},{"key":"ref_77","doi-asserted-by":"crossref","unstructured":"Li, Z., Gan, Y., Liang, X., Yu, Y., Cheng, H., and Lin, L. (2016). Lstm-cf: Unifying context modeling and fusion with lstms for rgb-d scene labeling. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-46475-6_34"},{"key":"ref_78","doi-asserted-by":"crossref","unstructured":"Eigen, D., and Fergus, R. (2015, January 7\u201313). Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.304"},{"key":"ref_79","doi-asserted-by":"crossref","unstructured":"Hu, X., Yang, K., Fei, L., and Wang, K. (2019, January 22\u201325). Acnet: Attention based network to exploit complementary features for rgbd semantic segmentation. Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan.","DOI":"10.1109\/ICIP.2019.8803025"},{"key":"ref_80","unstructured":"Hazirbas, C., Ma, L., Domokos, C., and Cremers, D. (2016). Fusenet: Incorporating depth into semantic segmentation via fusion-based cnn architecture. Asian Conference on Computer Vision, Springer."},{"key":"ref_81","doi-asserted-by":"crossref","unstructured":"Qi, X., Liao, R., Jia, J., Fidler, S., and Urtasun, R. (2017, January 22\u201329). 3D graph neural networks for rgbd semantic segmentation. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.556"},{"key":"ref_82","doi-asserted-by":"crossref","unstructured":"Lai, K., Bo, L., Ren, X., and Fox, D. (2011, January 9\u201313). A large-scale hierarchical multi-view rgb-d object dataset. Proceedings of the 2011 IEEE International Conference on Robotics and Automation, Shanghai, China.","DOI":"10.1109\/ICRA.2011.5980382"},{"key":"ref_83","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1214\/aoms\/1177729694","article-title":"On information and sufficiency","volume":"22","author":"Kullback","year":"1951","journal-title":"Ann. Math. Stat."},{"key":"ref_84","unstructured":"Bishop, C.M. (2006). Pattern Recognition and Machine Learning, Springer."},{"key":"ref_85","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, NeurIPS Foundation."},{"key":"ref_86","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/14\/3816\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T09:48:59Z","timestamp":1760176139000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/14\/3816"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,7,8]]},"references-count":86,"journal-issue":{"issue":"14","published-online":{"date-parts":[[2020,7]]}},"alternative-id":["s20143816"],"URL":"https:\/\/doi.org\/10.3390\/s20143816","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,7,8]]}}}