{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T05:41:21Z","timestamp":1775022081010,"version":"3.50.1"},"reference-count":200,"publisher":"MDPI AG","issue":"20","license":[{"start":{"date-parts":[[2021,10,9]],"date-time":"2021-10-09T00:00:00Z","timestamp":1633737600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100011354","name":"State Key Laboratory of Geo-Information Engineering","doi-asserted-by":"publisher","award":["SKLGIE2019-Z-3-1"],"award-info":[{"award-number":["SKLGIE2019-Z-3-1"]}],"id":[{"id":"10.13039\/501100011354","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Fundamental Research Funds of Beijing University of Civil Engineering and Architecture","award":["X18063"],"award-info":[{"award-number":["X18063"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["41601409"],"award-info":[{"award-number":["41601409"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["41971350"],"award-info":[{"award-number":["41971350"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004826","name":"Beijing Natural Science Foundation","doi-asserted-by":"publisher","award":["8172016"],"award-info":[{"award-number":["8172016"]}],"id":[{"id":"10.13039\/501100004826","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Open Research Fund Program of LIESMARS","award":["19E01"],"award-info":[{"award-number":["19E01"]}]},{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2018YFC0807806"],"award-info":[{"award-number":["2018YFC0807806"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"name":"BUCEA Post Graduate Innovation Project","award":["31081021004"],"award-info":[{"award-number":["31081021004"]}]},{"name":"Beijing Advanced Innovation Centre for Future Urban Design Project","award":["UDC2019031724"],"award-info":[{"award-number":["UDC2019031724"]}]},{"name":"Teacher Support Program for Pyramid Talent Training Project of Beijing University of Civil Engineering and Architecture","award":["JDJQ20200307"],"award-info":[{"award-number":["JDJQ20200307"]}]},{"name":"Open Research Fund Program of Key Laboratory of Digital Mapping and Land Information Application\uff0cMinisitry of Natural Resources","award":["ZRZYBWD202102"],"award-info":[{"award-number":["ZRZYBWD202102"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Recently, researchers have realized a number of achievements involving deep-learning-based neural networks for the tasks of segmentation and detection based on 2D images, 3D point clouds, etc. Using 2D and 3D information fusion for the advantages of compensation and accuracy improvement has become a hot research topic. However, there are no critical reviews focusing on the fusion strategies of 2D and 3D information integration based on various data for segmentation and detection, which are the basic tasks of computer vision. To boost the development of this research domain, the existing representative fusion strategies are collected, introduced, categorized, and summarized in this paper. In addition, the general structures of different kinds of fusion strategies were firstly abstracted and categorized, which may inspire researchers. Moreover, according to the methods included in this paper, the 2D information and 3D information of different methods come from various kinds of data. Furthermore, suitable datasets are introduced and comparatively summarized to support the relative research. Last but not least, we put forward some open challenges and promising directions for future research.<\/jats:p>","DOI":"10.3390\/rs13204029","type":"journal-article","created":{"date-parts":[[2021,10,10]],"date-time":"2021-10-10T21:37:49Z","timestamp":1633901869000},"page":"4029","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":16,"title":["The Fusion Strategy of 2D and 3D Information Based on Deep Learning: A Review"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2712-9141","authenticated-orcid":false,"given":"Jianghong","family":"Zhao","sequence":"first","affiliation":[{"name":"State Key Laboratory of Geo-Information Engineering, Xi\u2019an 710054, China"},{"name":"School of Geomatics and Urban Spatial Informatics, Beijing University of Civil Engineering and Architecture, Beijing 102616, China"},{"name":"Key Laboratory of Modern Urban Surveying and Mapping, National Administration of Surveying, Mapping and Geoinformation, Beijing 102616, China"},{"name":"Beijing Key Laboratory for Architectural Heritage Fine Reconstruction & Health Monitoring, Beijing 102616, China"},{"name":"State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430072, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9546-8738","authenticated-orcid":false,"given":"Yinrui","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Geomatics and Urban Spatial Informatics, Beijing University of Civil Engineering and Architecture, Beijing 102616, China"},{"name":"Beijing Key Laboratory for Architectural Heritage Fine Reconstruction & Health Monitoring, Beijing 102616, China"}]},{"given":"Yuee","family":"Cao","sequence":"additional","affiliation":[{"name":"School of Environment and Geographical Sciences, Shanghai Normal University, Shanghai 200234, China"}]},{"given":"Ming","family":"Guo","sequence":"additional","affiliation":[{"name":"School of Geomatics and Urban Spatial Informatics, Beijing University of Civil Engineering and Architecture, Beijing 102616, China"},{"name":"Beijing Key Laboratory for Architectural Heritage Fine Reconstruction & Health Monitoring, Beijing 102616, China"}]},{"given":"Xianfeng","family":"Huang","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430072, China"}]},{"given":"Ruiju","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Geomatics and Urban Spatial Informatics, Beijing University of Civil Engineering and Architecture, Beijing 102616, China"},{"name":"Beijing Key Laboratory for Architectural Heritage Fine Reconstruction & Health Monitoring, Beijing 102616, China"}]},{"given":"Xintong","family":"Dou","sequence":"additional","affiliation":[{"name":"School of Geomatics and Urban Spatial Informatics, Beijing University of Civil Engineering and Architecture, Beijing 102616, China"},{"name":"Beijing Key Laboratory for Architectural Heritage Fine Reconstruction & Health Monitoring, Beijing 102616, China"}]},{"given":"Xinyu","family":"Niu","sequence":"additional","affiliation":[{"name":"School of Geomatics and Urban Spatial Informatics, Beijing University of Civil Engineering and Architecture, Beijing 102616, China"},{"name":"Beijing Key Laboratory for Architectural Heritage Fine Reconstruction & Health Monitoring, Beijing 102616, China"}]},{"given":"Yuanyuan","family":"Cui","sequence":"additional","affiliation":[{"name":"School of Geomatics and Urban Spatial Informatics, Beijing University of Civil Engineering and Architecture, Beijing 102616, China"},{"name":"Beijing Key Laboratory for Architectural Heritage Fine Reconstruction & Health Monitoring, Beijing 102616, China"}]},{"given":"Jun","family":"Wang","sequence":"additional","affiliation":[{"name":"Culture Development Research Institute, School of Humanities, Beijing University of Civil Engineering and Architecture, Beijing 102616, China"}]}],"member":"1968","published-online":{"date-parts":[[2021,10,9]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Dong, S., Wang, P., and Abbas, K. (2021). A survey on deep learning and its applications. Comput. Sci. Rev., 40.","DOI":"10.1016\/j.cosrev.2021.100379"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Bello, S.A., Yu, S., Wang, C., Adam, J.M., and Li, J. (2020). Review: Deep learning on 3D point clouds. Remote. Sens., 12.","DOI":"10.3390\/rs12111729"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Guo, Z., Huang, Y., Hu, X., Wei, H., and Zhao, B. (2021). A survey on deep learning based approaches for scene understanding in autonomous driving. Electronics, 10.","DOI":"10.3390\/electronics10040471"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Arshad, S., and Kim, G.-W. (2021). Role of deep learning in loop closure detection for visual and lidar SLAM: A survey. Sensors, 21.","DOI":"10.3390\/s21041243"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Yuan, X., Shi, J., and Gu, L. (2021). A review of deep learning methods for semantic segmentation of remote sensing imagery. Expert Syst. Appl., 169.","DOI":"10.1016\/j.eswa.2020.114417"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"351","DOI":"10.1007\/s11036-020-01672-7","article-title":"A review of deep learning on medical image analysis","volume":"26","author":"Wang","year":"2021","journal-title":"Mob. Netw. Appl."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Liu, X., Song, L., Liu, S., and Zhang, Y. (2021). A review of deep-learning-based medical image segmentation methods. Sustainability, 13.","DOI":"10.3390\/su13031224"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"495","DOI":"10.1016\/j.comcom.2020.01.016","article-title":"Deep learning and big data technologies for IoT security","volume":"151","author":"Amanullah","year":"2020","journal-title":"Comput. Commun."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"38","DOI":"10.1109\/MGRS.2019.2937630","article-title":"Linking points with labels in 3D: A review of point cloud semantic segmentation","volume":"8","author":"Xie","year":"2020","journal-title":"IEEE Geosci. Remote. Sens. Mag."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Minaee, S., Boykov, Y.Y., Porikli, F., Plaza, A.J., Kehtarnavaz, N., and Terzopoulos, D. (2021). Image segmentation using deep learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell.","DOI":"10.1109\/TPAMI.2021.3059968"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"3212","DOI":"10.1109\/TNNLS.2018.2876865","article-title":"Object detection with deep learning: A review","volume":"30","author":"Zhao","year":"2019","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1152","DOI":"10.1109\/JSEN.2020.3020626","article-title":"Deep 3D object detection networks using LiDAR data: A review","volume":"21","author":"Wu","year":"2021","journal-title":"IEEE Sens. J."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1016\/j.neucom.2020.12.089","article-title":"Deep learning for monocular depth estimation: A review","volume":"438","author":"Ming","year":"2021","journal-title":"Neurocomputing"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Yao, G., Yilmaz, A., Meng, F., and Zhang, L. (2021). Review of wide-baseline stereo image matching based on deep learning. Remote Sens., 13.","DOI":"10.3390\/rs13163247"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Raj, T., Hashim, F.H., Huddin, A.B., Ibrahim, M.F., and Hussain, A. (2020). A survey on LiDAR scanning mechanisms. Electronics, 9.","DOI":"10.3390\/electronics9050741"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Bi, S., Yuan, C., Liu, C., Cheng, J., Wang, W., and Cai, Y. (2021). A survey of low-cost 3D laser scanning technology. Appl. Sci., 11.","DOI":"10.3390\/app11093938"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1080\/19479832.2016.1160960","article-title":"Advances in fusion of optical imagery and LiDAR point cloud applied to photogrammetry and remote sensing","volume":"8","author":"Zhang","year":"2017","journal-title":"Int. J. Image Data Fusion"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"2847","DOI":"10.1109\/ACCESS.2019.2962554","article-title":"Multi-sensor fusion in automated driving: A survey","volume":"8","author":"Wang","year":"2019","journal-title":"IEEE Access"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Debeunne, C., and Vivet, D. (2020). A review of visual-LiDAR fusion based simultaneous localization and mapping. Sensors, 20.","DOI":"10.3390\/s20072068"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Fayyad, J., Jaradat, M.A., Gruyer, D., and Najjaran, H. (2020). Deep learning sensor fusion for autonomous vehicle perception and localization: A review. Sensors, 20.","DOI":"10.3390\/s20154220"},{"key":"ref_21","unstructured":"Cui, Y., Chen, R., Chu, W., Chen, L., Tian, D., Li, Y., and Cao, D. (2021). Deep learning for image and point cloud fusion in autonomous driving: A review. IEEE Trans. Intell. Transp. Syst., 1\u201318."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-based learning applied to document recognition","volume":"86","author":"LeCun","year":"1998","journal-title":"Proc. IEEE"},{"key":"ref_23","first-page":"1097","article-title":"Imagenet classification with deep convolutional neural networks","volume":"2","author":"Krizhevsky","year":"2012","journal-title":"Adv. Neural Inf. Process. Sist."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2014). Going deeper with convolutions. arXiv.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_25","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"He, K.M., Zhang, X.Y., Ren, S.Q., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster r-cnn: Towards real-time object detection with region proposal networks","volume":"39","author":"Ren","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_30","unstructured":"Dai, J., Li, Y., He, K., and Sun, J. (2016). R-fcn: Object detection via. region-based fully convolutional networks. Advances in Neural Information Processing Systems, MIT Press."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 22\u201329). Mask r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Yoo, D., Park, S., Lee, J.-Y., Paek, A.S., and Kweon, I.S. (2015, January 7\u201313). AttentionNet: Aggregating weak directions for accurate object detection. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.305"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., and Berg, A.C. (2016, January 11\u201314). Ssd: Single shot multibox detector. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Redmon, J., and Farhadi, A. (2017, January 21\u201326). YOLO9000: Better, faster, stronger. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.690"},{"key":"ref_37","unstructured":"Redmon, J., and Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv."},{"key":"ref_38","unstructured":"Bochkovskiy, A., Wang, C.-Y., and Liao, H.-Y.M. (2020). Yolov4: Optimal speed and accuracy of object detection. arXiv."},{"key":"ref_39","unstructured":"Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A.L. (2014). Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv."},{"key":"ref_40","first-page":"109","article-title":"Efficient inference in fully connected crfs with gaussian edge potentials","volume":"24","author":"Koltun","year":"2011","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_41","unstructured":"Liu, W., Rabinovich, A., and Berg, A.C. (2015). Parsenet: Looking wider to see better. arXiv."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Pinheiro, P.O., Lin, T.-Y., Collobert, R., and Doll\u00e1r, P. (2016). Learning to refine object segments. Lecture Notes in Computer Science, Springer.","DOI":"10.1007\/978-3-319-46448-0_5"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Zeiler, M.D., Taylor, G.W., and Fergus, R. (2011, January 6\u201313). Adaptive deconvolutional networks for mid and high level feature learning. Proceedings of the 2011 IEEE International Conference on Computer Vision, Barcelona, Spain.","DOI":"10.1109\/ICCV.2011.6126474"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Zeiler, M.D., and Fergus, R. (2014). Visualizing and understanding convolutional networks. Computer Vision \u2013 ECCV 2014, ECCV 2014, Lecture Notes in Computer Science, Springer.","DOI":"10.1007\/978-3-319-10590-1_53"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"2481","DOI":"10.1109\/TPAMI.2016.2644615","article-title":"Segnet: A deep convolutional encoder-decoder architecture for image segmentation","volume":"39","author":"Badrinarayanan","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-Net: Convolutional Networks for Biomedical Image Segmentation. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germaby.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","article-title":"Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs","volume":"40","author":"Chen","year":"2018","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_48","unstructured":"Paszke, A., Chaurasia, A., Kim, S., and Culurciello, E. (2016). Enet: A deep neural network architecture for real-time semantic segmentation. arXiv."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Guo, Y., Wang, H., Hu, Q., Liu, H., Liu, L., and Bennamoun, M. (2020). Deep learning for 3D point clouds: A survey. IEEE Trans. Pattern Anal. Mach. Intell.","DOI":"10.1109\/TPAMI.2020.3005434"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Liu, W., Sun, J., Li, W., Hu, T., and Wang, P. (2019). Deep learning on point clouds and its application: A survey. Sensors, 19.","DOI":"10.3390\/s19194188"},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"179118","DOI":"10.1109\/ACCESS.2019.2958671","article-title":"A review of deep learning-based semantic segmentation for point cloud","volume":"7","author":"Zhang","year":"2019","journal-title":"IEEE Access"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Su, H., Maji, S., Kalogerakis, E., and Learned-Miller, E. (2015, January 7\u201313). Multi-view convolutional neural networks for 3D shape recognition. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.114"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Yang, Z., and Wang, L. (November, January 27). Learning relationships for multi-view 3D object recognition. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00760"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Wei, X., Yu, R., and Sun, J. (2020, January 16\u201318). View-GCN: View-based graph convolutional network for 3D shape analysis. Proceedings of the CVPR 2020: IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00192"},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Maturana, D., and Scherer, S. (October, January 28). Voxnet: A 3D convolutional neural network for real-time object recognition. Proceedings of the IROS 2015\u2014IEEE\/RSJ International Conference on Intelligent Robots and Systems, Hamburg, Germany.","DOI":"10.1109\/IROS.2015.7353481"},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Riegler, G., Ulusoy, A.O., and Geiger, A. (2017, January 21\u201326). Octnet: Learning deep 3D representations at high resolutions. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Nonolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.701"},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"1578","DOI":"10.1109\/TPAMI.2019.2954885","article-title":"Image-based 3D object reconstruction: State-of-the-art and trends in the deep learning era","volume":"43","author":"Han","year":"2021","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_58","unstructured":"Qi, C.R., Su, H., Mo, K., and Guibas, L.J. (2017). PointNet: Deep learning on point sets for 3D classification and segmentation. arXiv."},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Wu, W., Qi, Z., and Fuxin, L. (2019, January 15\u201321). Pointconv: Deep convolutional networks on 3D point clouds. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Longh Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00985"},{"key":"ref_60","doi-asserted-by":"crossref","unstructured":"Xu, Y., Fan, T., Xu, M., Zeng, L., and Qiao, Y. (2018). Spidercnn: Deep learning on point sets with parameterized convolutional filters. Computer Science Logic, Springer.","DOI":"10.1007\/978-3-030-01237-3_6"},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Simonovsky, M., and Komodakis, N. (2017, January 21\u201326). Dynamic edge-conditioned filters in convolutional neural networks on graphs. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.11"},{"key":"ref_62","first-page":"1","article-title":"Dynamic graph CNN for learning on point clouds","volume":"38","author":"Wang","year":"2019","journal-title":"ACM Trans. Graph."},{"key":"ref_63","doi-asserted-by":"crossref","unstructured":"Shi, S., Wang, X., and Li, H. (2019, January 15\u201321). Pointrcnn: 3D object proposal generation and detection from point cloud. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Longh Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00086"},{"key":"ref_64","unstructured":"Zarzar, J., Giancola, S., and Ghanem, B. (2019). Pointrgcn: Graph convolution networks for 3D vehicles detection refinement. arXiv."},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"Yang, Z., Sun, Y., Liu, S., Shen, X., and Jia, J. (November, January 27). STD: Sparse-to-dense 3D object detector for point cloud. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00204"},{"key":"ref_66","unstructured":"Lehner, J., Mitterecker, A., Adler, T., Hofmarcher, M., Nessler, B., and Hochreiter, S. (2019). Patch refinement-localized 3D object detection. arXiv."},{"key":"ref_67","doi-asserted-by":"crossref","unstructured":"Qi, C.R., Litany, O., He, K., and Guibas, L. (November, January 27). Deep hough voting for 3D object detection in point clouds. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00937"},{"key":"ref_68","doi-asserted-by":"crossref","unstructured":"Li, B., Zhang, T., and Xia, T. (2016). Vehicle detection from 3D lidar using fully convolutional network. arXiv.","DOI":"10.15607\/RSS.2016.XII.042"},{"key":"ref_69","doi-asserted-by":"crossref","unstructured":"Yang, Z., Sun, Y., Liu, S., and Jia, J. (2020, January 16\u201318). 3DSSD: Point-based 3D single stage object detector. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01105"},{"key":"ref_70","doi-asserted-by":"crossref","unstructured":"Lawin, F.J., Danelljan, M., Tosteberg, P., Bhat, G., Khan, F.S., and Felsberg, M. (2017). Deep projective 3D semantic segmentation. Programming Languages and Systems, Springer Science and Business Media.","DOI":"10.1007\/978-3-319-64689-3_8"},{"key":"ref_71","doi-asserted-by":"crossref","unstructured":"Wu, B., Wan, A., Yue, X., and Keutzer, K. (2018, January 21\u201325). Squeezeseg: Convolutional neural nets with recurrent CRF for real-time road-object segmentation from 3D lidar point cloud. Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Queensland, AU.","DOI":"10.1109\/ICRA.2018.8462926"},{"key":"ref_72","doi-asserted-by":"crossref","unstructured":"Tchapmi, L., Choy, C., Armeni, I., Gwak, J., and Savarese, S. (2017, January 10\u201312). SEGCloud: Semantic segmentation of 3D point clouds. Proceedings of the 2017 International Conference on 3D Vision (3DV), Qingdao, China.","DOI":"10.1109\/3DV.2017.00067"},{"key":"ref_73","doi-asserted-by":"crossref","unstructured":"Rethage, D., Wald, J., Sturm, J., Navab, N., and Tombari, F. (2018). Fully-convolutional point networks for large-scale point clouds. Advances in Knowledge Discovery and Data Mining, Springer.","DOI":"10.1007\/978-3-030-01225-0_37"},{"key":"ref_74","unstructured":"Qi, C.R., Yi, L., Su, H., and Guibas, L.J. (2017). Pointnet++: Deep hi-erarchical. feature learning on point sets in a metric space. Advances in Neural Information Processing Systems. arXiv."},{"key":"ref_75","unstructured":"Li, Y., Bu, R., Sun, M., Wu, W., Di, X., and Chen, B. (2018). Pointcnn: Convolution on X-transformed points. arXiv."},{"key":"ref_76","doi-asserted-by":"crossref","first-page":"346","DOI":"10.1016\/j.neucom.2018.09.008","article-title":"DGCNN: Disordered graph convolutional neural network based on the Gaussian mixture model","volume":"321","author":"Wu","year":"2018","journal-title":"Neurocomputing"},{"key":"ref_77","doi-asserted-by":"crossref","unstructured":"Landrieu, L., and Simonovsky, M. (2018, January 18\u201323). Large-scale point cloud semantic segmentation with superpoint graphs. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00479"},{"key":"ref_78","doi-asserted-by":"crossref","unstructured":"Yi, L., Zhao, W., Wang, H., Sung, M., and Guibas, L.J. (2019, January 16\u201320). GSPN: Generative shape proposal network for 3D instance segmentation in point cloud. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00407"},{"key":"ref_79","unstructured":"Yang, B., Wang, J., Clark, R., Hu, Q., Wang, S., Markham, A., and Trigoni, N. (2019). Learning object bounding boxes for 3D instance segmentation on point clouds. arXiv."},{"key":"ref_80","doi-asserted-by":"crossref","unstructured":"Wang, W., Yu, R., Huang, Q., and Neumann, U. (2018, January 18\u201323). SGPN: Similarity group proposal network for 3D point cloud instance segmentation. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, CA, USA.","DOI":"10.1109\/CVPR.2018.00272"},{"key":"ref_81","doi-asserted-by":"crossref","unstructured":"Wang, X., Liu, S., Shen, X., Shen, C., and Jia, J. (2019, January 16\u201320). Associatively segmenting instances and semantics in point clouds. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00422"},{"key":"ref_82","doi-asserted-by":"crossref","unstructured":"Lai, K., Bo, L., Ren, X., and Fox, D. (2011, January 9\u201313). A large-scale hierarchical multi-view RGB-D object dataset. Proceedings of the 2011 IEEE International Conference on Robotics and Automation, Shanghai, China.","DOI":"10.1109\/ICRA.2011.5980382"},{"key":"ref_83","unstructured":"Koppula, H.S., Anand, A., Joachims, T., and Saxena, A. (2011, January 12\u201317). Semantic labeling of 3D point clouds for indoor scenes. Proceedings of the Neural Information Processing Systems, Granada, Spain."},{"key":"ref_84","doi-asserted-by":"crossref","unstructured":"Janoch, A., Karayev, S., Jia, Y., Barron, J.T., Fritz, M., Saenko, K., and Darrell, T. (2013). A category-level 3D object dataset: Putting the kinect to work. RGB-D Image Analysis and Processing, Springer Science and Business Media.","DOI":"10.1007\/978-1-4471-4640-7_8"},{"key":"ref_85","doi-asserted-by":"crossref","unstructured":"Susanto, W., Rohrbach, M., and Schiele, B. (2012). 3D object detection with multiple kinects. Programming Languages and Systems, Springer Science and Business Media.","DOI":"10.1007\/978-3-642-33868-7_10"},{"key":"ref_86","doi-asserted-by":"crossref","unstructured":"Silberman, N., and Fergus, R. (2011, January 6\u201311). Indoor scene segmentation using a structured light sensor. Proceedings of the 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), Barcelona, Spain.","DOI":"10.1109\/ICCVW.2011.6130298"},{"key":"ref_87","doi-asserted-by":"crossref","unstructured":"Silberman, N., Hoiem, D., Kohli, P., and Fergus, R. (2012, January 7\u201313). Indoor segmentation and support inference from rgbd images. Proceedings of the European Conference on Computer Vision, Florence, Italy.","DOI":"10.1007\/978-3-642-33715-4_54"},{"key":"ref_88","doi-asserted-by":"crossref","unstructured":"Zhang, Q., Song, X., Shao, X., Shibasaki, R., and Zhao, H. (2013, January 23\u201328). Category modeling from just a single labeling: Use depth information to guide the learning of 2D models. Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.32"},{"key":"ref_89","doi-asserted-by":"crossref","unstructured":"Xiao, J., Owens, A., and Torralba, A. (2013, January 1\u20138). SUN3D: A database of big spaces reconstructed using SfM and object labels. Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, NSW, Australia.","DOI":"10.1109\/ICCV.2013.458"},{"key":"ref_90","doi-asserted-by":"crossref","unstructured":"Lai, K., Bo, L., and Fox, D. (June, January 31). Unsupervised feature learning for 3D scene labeling. Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China.","DOI":"10.1109\/ICRA.2014.6907298"},{"key":"ref_91","doi-asserted-by":"crossref","unstructured":"Song, S., Lichtenberg, S.P., and Xiao, J. (2015, January 7\u201312). Sun rgb-d: A rgb-d scene understanding benchmark suite. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298655"},{"key":"ref_92","doi-asserted-by":"crossref","first-page":"1681","DOI":"10.1177\/0278364915596058","article-title":"Vidrilo: The visual and depth robot indoor localization with objects information dataset","volume":"34","author":"Cazorla","year":"2015","journal-title":"Int. J. Robot. Res."},{"key":"ref_93","doi-asserted-by":"crossref","unstructured":"Hua, B.-S., Pham, Q.-H., Nguyen, D.T., Tran, M.-K., Yu, L.-F., and Yeung, S.-K. (2016, January 25\u201328). Scenenn: A scene meshes dataset with annotations. Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA.","DOI":"10.1109\/3DV.2016.18"},{"key":"ref_94","doi-asserted-by":"crossref","unstructured":"Handa, A., Patraucean, V., Badrinarayanan, V., Stent, S., and Cipolla, R. (2015). Scenenet: Understanding real world indoor scenes with synthetic data. arXiv.","DOI":"10.1109\/CVPR.2016.442"},{"key":"ref_95","unstructured":"McCormac, J., Handa, A., Leutenegger, S., and Davison, A.J. (2016). Scenenet rgb-d: 5 M photorealistic images of synthetic indoor trajectories with ground truth. arXiv."},{"key":"ref_96","doi-asserted-by":"crossref","unstructured":"Georgakis, G., Reza, M.A., Mousavian, A., Le, P.-H., and Ko\u0161eck\u00e1, J. (2016, January 25\u201328). Multiview RGB-D dataset for object instance detection. Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA.","DOI":"10.1109\/3DV.2016.52"},{"key":"ref_97","doi-asserted-by":"crossref","unstructured":"Chang, A., Dai, A., Funkhouser, T., Halber, M., Niebner, M., Savva, M., Song, S., Zeng, A., and Zhang, Y. (2017, January 10\u201312). Matterport3D: Learning from RGB-D data in indoor environments. Proceedings of the International Conference 3D Vision 2017, Qingdao, China.","DOI":"10.1109\/3DV.2017.00081"},{"key":"ref_98","doi-asserted-by":"crossref","unstructured":"Tombari, F., Di Stefano, L., and Giardino, S. (2011, January 25\u201330). Online learning for automatic segmentation of 3D data. Proceedings of the 2011 IEEE\/RSJ International Conference on Intelligent Robots and Systems, San Francisco, CA, USA.","DOI":"10.1109\/IROS.2011.6048294"},{"key":"ref_99","unstructured":"Vasiljevic, I., Kolkin, N., Zhang, S., Luo, R., Wang, H., Dai, F.Z., Daniele, A.F., Mostajabi, M., Basart, S., and Walter, M.R. (2019). Diode: A dense indoor and outdoor depth dataset. arXiv."},{"key":"ref_100","doi-asserted-by":"crossref","unstructured":"Armeni, I., Sener, O., Zamir, A.R., Jiang, H., Brilakis, I., Fischer, M., and Savarese, S. (2016, January 27\u201330). 3D semantic parsing of large-scale indoor spaces. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.170"},{"key":"ref_101","unstructured":"Armeni, I., Sax, S., Zamir, A.R., and Savarese, S. (2017). Joint 2D-3D-semantic data for indoor scene understanding. arXiv."},{"key":"ref_102","doi-asserted-by":"crossref","unstructured":"Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., and Niessner, M. (2017, January 21\u201326). Scannet: Richly-annotated 3D reconstructions of indoor scenes. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.261"},{"key":"ref_103","doi-asserted-by":"crossref","unstructured":"Sun, X., Xie, Y., Luo, P., and Wang, L. (2017, January 21\u201326). A Dataset for Benchmarking Image-Based Localization. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.598"},{"key":"ref_104","unstructured":"Chang, A.X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., and Su, H. (2015). Shapenet: An information-rich 3D model repository. arXiv."},{"key":"ref_105","doi-asserted-by":"crossref","unstructured":"Uy, M.A., Pham, Q.-H., Hua, B.-S., Nguyen, T., and Yeung, S.-K. (November, January 27). Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00167"},{"key":"ref_106","doi-asserted-by":"crossref","unstructured":"Geiger, A., Lenz, P., and Urtasun, R. (2012, January 16\u201321). Are we ready for autonomous driving? The KITTI vision benchmark suite. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6248074"},{"key":"ref_107","doi-asserted-by":"crossref","first-page":"1231","DOI":"10.1177\/0278364913491297","article-title":"Vision meets robotics: The KITTI dataset","volume":"32","author":"Geiger","year":"2013","journal-title":"Int. J. Robot. Res."},{"key":"ref_108","doi-asserted-by":"crossref","unstructured":"Ros, G., Ramos, S., Granados, M., Bakhtiary, A., Vazquez, D., and L\u00f3pez, A. (2015, January 5\u20139). Vision-based offline-online perception paradigm for autonomous driving. Proceedings of the 2015 IEEE Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA.","DOI":"10.1109\/WACV.2015.38"},{"key":"ref_109","doi-asserted-by":"crossref","unstructured":"Zhang, R., Candra, S.A., Vetter, K., and Zakhor, A. (2015, January 25\u201330). Sensor fusion for semantic segmentation of urban scenes. Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA.","DOI":"10.1109\/ICRA.2015.7139439"},{"key":"ref_110","doi-asserted-by":"crossref","unstructured":"Behley, J., Garbade, M., Milioto, A., Quenzel, J., Behnke, S., Stachniss, C., and Gall, J. (November, January 27). Semantickitti: A dataset for semantic scene understanding of LiDAR sequences. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00939"},{"key":"ref_111","doi-asserted-by":"crossref","unstructured":"Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., and Beijbom, O. (2020, January 13\u201319). nuScenes: A multimodal dataset for autonomous driving. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01164"},{"key":"ref_112","doi-asserted-by":"crossref","first-page":"108","DOI":"10.1016\/j.patrec.2021.06.004","article-title":"Semantic segmentation on Swiss3DCities: A benchmark study on aerial photogrammetric 3D pointcloud dataset","volume":"150","author":"Can","year":"2021","journal-title":"Pattern Recognit. Lett."},{"key":"ref_113","unstructured":"Geyer, J., Kassahun, Y., Mahmudi, M., Ricou, X., Durgesh, R., Chung, A.S., Hauswald, L., Pham, V.H., M\u00fchlegg, M., and Dorn, S. (2020). A2D2: Audi autonomous driving dataset. arXiv."},{"key":"ref_114","doi-asserted-by":"crossref","unstructured":"Tan, W., Qin, N., Ma, L., Li, Y., Du, J., Cai, G., Yang, K., and Li, J. (2020, January 14\u201319). Toronto-3D: A Large-scale Mobile LiDAR dataset for semantic segmentation of urban roadways. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA.","DOI":"10.1109\/CVPRW50498.2020.00109"},{"key":"ref_115","doi-asserted-by":"crossref","unstructured":"Hackel, T., Savinov, N., Ladicky, L., Wegner, J.D., Schindler, K., and Pollefeys, M. (2017). Semantic3d.Net: A new large-scale point cloud classification benchmark. arXiv.","DOI":"10.5194\/isprs-annals-IV-1-W1-91-2017"},{"key":"ref_116","doi-asserted-by":"crossref","first-page":"87695","DOI":"10.1109\/ACCESS.2020.2992612","article-title":"CSPC-Dataset: New lidar point cloud dataset and benchmark for large-scale scene semantic segmentation","volume":"8","author":"Tong","year":"2020","journal-title":"IEEE Access"},{"key":"ref_117","unstructured":"Weng, X., Man, Y., Cheng, D., Park, J., O\u2019Toole, M., Kitani, K., Wang, J., and Held, D. (2021, May 18). All-in-One Drive: A Large-Scale Comprehensive Perception Dataset with High-Density Long-Range Point Clouds. Available online: https:\/\/www.researchgate.net\/publication\/347112693_All-In-One_Drive_A_Large-Scale_Comprehensive_Perception_Dataset_with_High-Density_Long-Range_Point_Clouds."},{"key":"ref_118","doi-asserted-by":"crossref","unstructured":"Chang, M.-F., Ramanan, D., Hays, J., Lambert, J., Sangkloy, P., Singh, J., Bak, S., Hartnett, A., Wang, D., and Carr, P. (2019, January 15\u201321). Argoverse: 3D tracking and forecasting with rich maps. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00895"},{"key":"ref_119","doi-asserted-by":"crossref","first-page":"2702","DOI":"10.1109\/TPAMI.2019.2926463","article-title":"The Apolloscape open dataset for autonomous driving and its application","volume":"42","author":"Huang","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_120","doi-asserted-by":"crossref","unstructured":"Gaidon, A., Wang, Q., Cabon, Y., and Vig, E. (July, January 26). Virtualworlds as proxy for multi-object tracking analysis. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.470"},{"key":"ref_121","unstructured":"Fang, J., Yan, F., Zhao, T., Zhang, F., Zhou, D., Yang, R., Ma, Y., and Wang, L. (2018). Simulating lidar point cloud for autonomous driving using real-world scenes and traffic flows. arXiv."},{"key":"ref_122","unstructured":"Yi, L., Shao, L., Savva, M., Huang, H., Zhou, Y., Wang, Q., Graham, B., Engelcke, M., Klokov, R., and Lempitsky, V. (2017). Large-scale 3D shape reconstruction and segmentation from shapenet core55. arXiv."},{"key":"ref_123","doi-asserted-by":"crossref","unstructured":"Mo, K., Zhu, S., Chang, A.X., Yi, L., Tripathi, S., Guibas, L.J., and Su, H. (2019, January 15\u201321). PartNet: A large-scale benchmark for fine-grained and hierarchical part-level 3D object understanding. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00100"},{"key":"ref_124","unstructured":"Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., and Xiao, J. (2015, January 7\u201312). 3D Shapenets: A deep representation for volumetric shapes. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA."},{"key":"ref_125","doi-asserted-by":"crossref","unstructured":"Richtsfeld, A., Morwald, T., Prankl, J., Zillich, M., and Vincze, M. (2012, January 7\u201312). Segmentation of unknown objects in indoor environments. Proceedings of the 2012 IEEE\/RSJ International Conference on Intelligent Robots and Systems, Algarve, Portugal.","DOI":"10.1109\/IROS.2012.6385661"},{"key":"ref_126","unstructured":"Taghanaki, S.A., Luo, J., Zhang, R., Wang, Y., Jayaraman, P.K., and Jatavallabhula, K.M. (2020). Robust point set: A dataset for benchmarking robustness of point cloud classifiers. arXiv."},{"key":"ref_127","unstructured":"De Deuge, M., Quadros, A., Hung, C., and Douillard, B. (2013, January 2\u20134). Unsupervised feature learning for classification of outdoor 3D scans. Proceedings of the Australasian Conference on Robitics and Automation, Sydney, New South Wales, AU."},{"key":"ref_128","unstructured":"Serna, A., Marcotegui, B., Goulette, F., and Deschaud, J.-E. (2014, January 6). Paris-rue-madame database\u2014A 3D mobile laser scanner dataset for benchmarking urban detection, segmentation and classification methods. Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods ICPRAM 2014, Angers, France."},{"key":"ref_129","doi-asserted-by":"crossref","first-page":"126","DOI":"10.1016\/j.cag.2015.03.004","article-title":"Terra mobilita\/iQmulus urban point cloud analysis benchmark","volume":"49","author":"Vallet","year":"2015","journal-title":"Comput. Graph."},{"key":"ref_130","doi-asserted-by":"crossref","unstructured":"Roynard, X., Deschaud, J.-E., and Goulette, F. (2018, January 18\u201322). Paris-lille-3D: A point cloud dataset for urban scene segmentation and classification. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPRW.2018.00272"},{"key":"ref_131","doi-asserted-by":"crossref","unstructured":"Wang, Y., Tan, X., Yang, Y., Liu, X., Ding, E., Zhou, F., and Davis, L.S. (2019). 3D pose estimation for fine-grained object categories. Transactions on Petri Nets and Other Models of Concurrency XV, Springer Science and Business Media.","DOI":"10.1007\/978-3-030-11009-3_38"},{"key":"ref_132","doi-asserted-by":"crossref","first-page":"35984","DOI":"10.1109\/ACCESS.2021.3062547","article-title":"Annotation tool and urban dataset for 3D point cloud semantic segmentation","volume":"9","author":"Ibrahim","year":"2021","journal-title":"IEEE Access"},{"key":"ref_133","doi-asserted-by":"crossref","first-page":"256","DOI":"10.1016\/j.isprsjprs.2013.10.004","article-title":"Results of the ISPRS benchmark on urban object detection and 3D building reconstruction","volume":"93","author":"Rottensteiner","year":"2014","journal-title":"ISPRS J. Photogramm. Remote. Sens."},{"key":"ref_134","unstructured":"Zolanvari, S., Ruano, S., Rana, A., Cummins, A., da Silva, R.E., Rahbar, M., and Smolic, A. (2019). Dublin city: Annotated lidar point cloud and its applications. arXiv."},{"key":"ref_135","doi-asserted-by":"crossref","unstructured":"Hu, Q., Yang, B., Khalid, S., Xiao, W., Trigoni, N., and Markham, A. (2021, January 19\u201325). Towards semantic segmentation of urban-scale 3D point clouds: A dataset, benchmarks and challenges. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Online Conference.","DOI":"10.1109\/CVPR46437.2021.00494"},{"key":"ref_136","doi-asserted-by":"crossref","unstructured":"Varney, N., Asari, V.K., and Graehling, Q. (2020, January 14\u201319). Dales: A large-scale aerial lidar data set for semantic segmentation. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Online Conference.","DOI":"10.1109\/CVPRW50498.2020.00101"},{"key":"ref_137","doi-asserted-by":"crossref","unstructured":"Ye, Z., Xu, Y., Huang, R., Tong, X., Li, X., Liu, X., Luan, K., Hoegner, L., and Stilla, U. (2020). Lasdu: A large-scale aerial lidar dataset for semantic labeling in dense urban areas. ISPRS Int. J. Geo-Inf., 9.","DOI":"10.3390\/ijgi9070450"},{"key":"ref_138","doi-asserted-by":"crossref","unstructured":"Li, X., Li, C., Tong, Z., Lim, A., Yuan, J., Wu, Y., Tang, J., and Huang, R. (2020, January 12\u201316). Campus3d: A photogrammetry point cloud benchmark for hierarchical understanding of outdoor scene. Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA.","DOI":"10.1145\/3394171.3413661"},{"key":"ref_139","doi-asserted-by":"crossref","unstructured":"Sun, P., Kretzschmar, H., Dotiwalla, X., Chouard, A., Patnaik, V., Tsui, P., Guo, J., Zhou, Y., Chai, Y., and Caine, B. (2020, January 14\u201319). Scalability in perception for autonomous driving: Waymo open dataset. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Online Conference.","DOI":"10.1109\/CVPR42600.2020.00252"},{"key":"ref_140","doi-asserted-by":"crossref","unstructured":"Wulff, F., Schaufele, B., Sawade, O., Becker, D., Henke, B., and Radusch, I. (July, January 30). Early fusion of camera and lidar for robust road detection based on U-net fcn. Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China.","DOI":"10.1109\/IVS.2018.8500549"},{"key":"ref_141","doi-asserted-by":"crossref","unstructured":"Erkent, O., Wolf, C., Laugier, C., Gonzalez, D.S., and Cano, V.R. (2018, January 1\u20135). Semantic grid estimation with a hybrid bayesian and deep neural network approach. Proceedings of the 2018 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain.","DOI":"10.1109\/IROS.2018.8593434"},{"key":"ref_142","doi-asserted-by":"crossref","unstructured":"Zhou, K., Ming, D., Lv, X., Fang, J., and Wang, M. (2019). CNN-based land cover classification combining stratified segmentation and fusion of point cloud and very high-spatial resolution remote sensing image Data. Remote. Sens., 11.","DOI":"10.3390\/rs11172065"},{"key":"ref_143","doi-asserted-by":"crossref","first-page":"5802","DOI":"10.1109\/TITS.2020.2988302","article-title":"Fast road detection by cnn-based camera\u2013lidar fusion and spherical coordinate transformation","volume":"22","author":"Lee","year":"2021","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_144","doi-asserted-by":"crossref","first-page":"351","DOI":"10.1109\/TIV.2018.2843170","article-title":"3-D LiDAR + monocular camera: An inverse-depth-induced fusion framework for urban road detection","volume":"3","author":"Gu","year":"2018","journal-title":"IEEE Trans. Intell. Veh."},{"key":"ref_145","doi-asserted-by":"crossref","unstructured":"Gu, S., Zhang, Y., Tang, J., Yang, J., and Kong, H. (2019, January 20\u201324). Road detection through CRF based lidar-camera fusion. Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada.","DOI":"10.1109\/ICRA.2019.8793585"},{"key":"ref_146","doi-asserted-by":"crossref","unstructured":"Narita, G., Seno, T., Ishikawa, T., and Kaji, Y. (2019, January 4\u20138). Panoptic fusion: Online volumetric semantic mapping at the level of stuff and things. Proceedings of the 2019 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China.","DOI":"10.1109\/IROS40897.2019.8967890"},{"key":"ref_147","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1016\/j.isprsjprs.2018.04.022","article-title":"Fusion of images and point clouds for the semantic segmentation of large-scale 3D scenes based on deep learning","volume":"143","author":"Zhang","year":"2018","journal-title":"ISPRS J. Photogramm. Remote. Sens."},{"key":"ref_148","doi-asserted-by":"crossref","unstructured":"Riemenschneider, H., B\u00f3dis-Szomor\u00fa, A., Weissenberg, J., and Van Gool, L. (2014). Learning where to classify in multi-view semantic segmentation. Programming Languages and Systems, Springer Science and Business Media.","DOI":"10.1007\/978-3-319-10602-1_34"},{"key":"ref_149","doi-asserted-by":"crossref","unstructured":"Qi, X., Liao, R., Jia, J., Fidler, S., and Urtasun, R. (2017, January 22\u201329). 3D graph neural networks for RGBD semantic segmentation. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.556"},{"key":"ref_150","doi-asserted-by":"crossref","unstructured":"Gupta, S., Girshick, R., Arbel\u00e1ez, P., and Malik, J. (2014). Learning rich features from RGB-D images for object detection and segmentation. Programming Languages and Systems, Springer Science and Business Media.","DOI":"10.1007\/978-3-319-10584-0_23"},{"key":"ref_151","doi-asserted-by":"crossref","unstructured":"Jaritz, M., De Charette, R., Wirbel, E., Perrotton, X., and Nashashibi, F. (2018, January 5\u20138). Sparse and dense data with CNNs: Depth completion and semantic segmentation. Proceedings of the 2018 International Conference on 3D Vision (3DV), Verona, Italy.","DOI":"10.1109\/3DV.2018.00017"},{"key":"ref_152","doi-asserted-by":"crossref","unstructured":"Dai, A., and Nie\u00dfner, M. (2018). 3DMV: Joint 3D-multi-view prediction for 3D semantic scene segmentation. Lecture Notes in Computer Science, Springer.","DOI":"10.1007\/978-3-030-01249-6_28"},{"key":"ref_153","doi-asserted-by":"crossref","unstructured":"Lv, X., Liu, Z., Xin, J., and Zheng, N. (2018). A novel approach for detecting road based on two-stream fusion fully convolutional network. IEEE Intell. Veh. Symp., 1464\u20131469.","DOI":"10.1109\/IVS.2018.8500551"},{"key":"ref_154","doi-asserted-by":"crossref","unstructured":"Yang, F., Yang, J., Jin, Z., and Wang, H. (2018, January 19\u201322). A Fusion model for road detection based on deep learning and fully connected CRF. Proceedings of the 13th Annual Conference on System of Systems Engineering (SoSE), Paris, France.","DOI":"10.1109\/SYSOSE.2018.8428696"},{"key":"ref_155","doi-asserted-by":"crossref","unstructured":"Su, H., Jampani, V., Sun, D., Maji, S., Kalogerakis, E., Yang, M.-H., and Kautz, J. (2018, January 18\u201323). Splatnet: Sparse lattice networks for point cloud processing. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00268"},{"key":"ref_156","doi-asserted-by":"crossref","unstructured":"Jaritz, M., Gu, J., and Su, H. (November, January 27). Multi-view pointnet for 3D scene understanding. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Korea.","DOI":"10.1109\/ICCVW.2019.00494"},{"key":"ref_157","doi-asserted-by":"crossref","unstructured":"Li, Z., Gan, Y., Liang, X., Yu, Y., Cheng, H., and Lin, L. (2016). LSTM-CF: Unifying context modeling and fusion with LSTMs for RGB-D scene labeling. Machine Learning in Clinical Neuroimaging, Springer.","DOI":"10.1007\/978-3-319-46475-6_34"},{"key":"ref_158","doi-asserted-by":"crossref","first-page":"22475","DOI":"10.1007\/s11042-018-6056-8","article-title":"RGB-D joint modelling with scene geometric information for indoor semantic segmentation","volume":"77","author":"Liu","year":"2018","journal-title":"Multimed. Tools Appl."},{"key":"ref_159","doi-asserted-by":"crossref","unstructured":"Hou, J., Dai, A., and NieBner, M. (2019, January 16\u201320). 3D-SIS: 3D semantic instance segmentation of RGB-D scans. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00455"},{"key":"ref_160","doi-asserted-by":"crossref","unstructured":"Yu, D., Xiong, H., Xu, Q., Wang, J., and Li, K. (2019, January 9\u201312). Multi-stage residual fusion network for lidar-camera road detection. Proceedings of the 2019 IEEE Intelligent Vehicles Symposium (IV), Paris, France.","DOI":"10.1109\/IVS.2019.8813983"},{"key":"ref_161","unstructured":"Li, H., Chen, Y., Zhang, Q., and Zhao, D. (2021). Bifnet: Bidirectional fusion network for road segmentation. IEEE Trans. Cybern., 1\u201312."},{"key":"ref_162","doi-asserted-by":"crossref","unstructured":"Yuan, J., Zhang, K., Xia, Y., and Qi, L. (2018, January 14\u201316). A fusion network for semantic segmentation using RGB-D data. Proceedings of the Ninth International Conference on Graphic and Image Processing (ICGIP), Qingdao, China.","DOI":"10.1117\/12.2304501"},{"key":"ref_163","doi-asserted-by":"crossref","unstructured":"Hu, X., Yang, K., Fei, L., and Wang, K. (2019, January 22\u201325). ACNET: Attention based network to exploit complementary features for RGBD semantic segmentation. Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan.","DOI":"10.1109\/ICIP.2019.8803025"},{"key":"ref_164","doi-asserted-by":"crossref","first-page":"2825","DOI":"10.1109\/TIP.2019.2891104","article-title":"Three-stream attention-aware network for RGB-D salient object detection","volume":"28","author":"Chen","year":"2019","journal-title":"IEEE Trans. Image Process."},{"key":"ref_165","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1109\/MIS.2020.2999462","article-title":"TSNet: Three-stream self-attention network for RGB-D indoor semantic segmentation","volume":"36","author":"Zhou","year":"2021","journal-title":"IEEE Intell. Syst."},{"key":"ref_166","doi-asserted-by":"crossref","unstructured":"Liu, C., Wu, J., and Furukawa, Y. (2018). FloorNet: A unified framework for floorplan reconstruction from 3D scans. Medical Image Computing and Computer-Assisted Intervention, Springer Science and Business Media.","DOI":"10.1007\/978-3-030-01231-1_13"},{"key":"ref_167","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1016\/j.robot.2018.11.002","article-title":"Lidar\u2014camera fusion for road detection using fully convolutional neural networks","volume":"111","author":"Caltagirone","year":"2019","journal-title":"Robot. Auton. Syst."},{"key":"ref_168","doi-asserted-by":"crossref","unstructured":"Kim, D.-K., Maturana, D., Uenoyama, M., and Scherer, S. (2018). Season-invariant semantic segmentation with a deep multimodal network. Experimental Robotics, Springer.","DOI":"10.1007\/978-3-319-67361-5_17"},{"key":"ref_169","doi-asserted-by":"crossref","unstructured":"Chiang, H.-Y., Lin, Y.-L., Liu, Y.-C., and Hsu, W.H. (2019, January 16\u201319). A Unified point-based framework for 3D segmentation. Proceedings of the 2019 International Conference on 3D Vision (3DV), Montreal, QC, Canada.","DOI":"10.1109\/3DV.2019.00026"},{"key":"ref_170","doi-asserted-by":"crossref","first-page":"693","DOI":"10.1109\/JAS.2019.1911459","article-title":"Progressive lidar adaptation for road detection","volume":"6","author":"Chen","year":"2019","journal-title":"IEEE\/CAA J. Autom. Sin."},{"key":"ref_171","doi-asserted-by":"crossref","unstructured":"Xu, J., Zhang, R., Dou, J., Zhu, Y., Sun, J., and Pu, S. (2021). Rpvnet: A deep and efficient range-point-voxel fusion network for lidar point cloud segmentation. arXiv.","DOI":"10.1109\/ICCV48922.2021.01572"},{"key":"ref_172","doi-asserted-by":"crossref","unstructured":"Nakajima, Y., Kang, B., Saito, H., and Kitani, K. (November, January 27). Incremental class discovery for semantic segmentation with RGBD sensing. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00106"},{"key":"ref_173","doi-asserted-by":"crossref","unstructured":"Martinovic, A., Knopp, J., Riemenschneider, H., and Van Gool, L. (2015, January 7\u201312). 3D all the way: Semantic segmentation of urban scenes from start to end in 3D. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7299075"},{"key":"ref_174","doi-asserted-by":"crossref","first-page":"286","DOI":"10.1016\/j.eswa.2017.07.042","article-title":"Exploiting synergies of mobile mapping sensors and deep learning for traffic sign recognition systems","volume":"89","author":"Riveiro","year":"2017","journal-title":"Expert Syst. Appl."},{"key":"ref_175","doi-asserted-by":"crossref","first-page":"1715","DOI":"10.1109\/JSTARS.2018.2810143","article-title":"Robust traffic-sign detection and classification using mobile lidar data with digital images","volume":"11","author":"Guan","year":"2018","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens."},{"key":"ref_176","doi-asserted-by":"crossref","unstructured":"Barea, R., Perez, C., Bergasa, L.M., Lopez-Guillen, E., Romera, E., Molinos, E., Ocana, M., and Lopez, J. (2018, January 4\u20137). Vehicle detection and localization using 3D lidar point cloud and image semantic segmentation. Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Big Island, HI, USA.","DOI":"10.1109\/ITSC.2018.8569962"},{"key":"ref_177","doi-asserted-by":"crossref","first-page":"1067","DOI":"10.1109\/LGRS.2019.2939354","article-title":"A convolutional capsule network for traffic-sign recognition using mobile lidar data with digital images","volume":"17","author":"Guan","year":"2019","journal-title":"IEEE Geosci. Remote. Sens. Lett."},{"key":"ref_178","doi-asserted-by":"crossref","unstructured":"Lahoud, J., and Ghanem, B. (2017, January 22\u201329). 2D-driven 3D object detection in RGB-D images. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.495"},{"key":"ref_179","doi-asserted-by":"crossref","unstructured":"Du, X., Ang, M.H., Karaman, S., and Rus, D. (2018, January 21\u201325). A general pipeline for 3D detection of vehicles. Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia.","DOI":"10.1109\/ICRA.2018.8461232"},{"key":"ref_180","doi-asserted-by":"crossref","unstructured":"Qi, C.R., Liu, W., Wu, C., Su, H., and Guibas, L.J. (2018, January 18\u201322). Frustum pointnets for 3D object detection from RGB-D data. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00102"},{"key":"ref_181","doi-asserted-by":"crossref","first-page":"9267","DOI":"10.1609\/aaai.v33i01.33019267","article-title":"3D object detection using scale invariant and feature reweighting networks","volume":"Volume 33","author":"Zhao","year":"2019","journal-title":"Proceedings of the Proceedings of the AAAI Conference on Artificial Intelligence"},{"key":"ref_182","doi-asserted-by":"crossref","unstructured":"Wang, Z., and Jia, K. (2019, January 4\u20138). Frustum convnet: Sliding frustums to aggregate local point-wise features for amodal 3D object detection. Proceedings of the 2019 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China.","DOI":"10.1109\/IROS40897.2019.8968513"},{"key":"ref_183","doi-asserted-by":"crossref","unstructured":"Shin, K., Kwon, Y.P., and Tomizuka, M. (2019, January 9\u201312). Roarnet: A robust 3D object detection based on region approximation refinement. Proceedings of the 2019 IEEE Intelligent Vehicles Symposium (IV), Paris, France.","DOI":"10.1109\/IVS.2019.8813895"},{"key":"ref_184","doi-asserted-by":"crossref","unstructured":"Yang, Z., Sun, Y., Liu, S., Shen, X., and Jia, J. (2018). Ipod: Intensive point-based object detector for point cloud. arXiv.","DOI":"10.1109\/ICCV.2019.00204"},{"key":"ref_185","doi-asserted-by":"crossref","unstructured":"Vora, S., Lang, A.H., Helou, B., and Beijbom, O. (2020, January 16\u201318). Pointpainting: Sequential fusion for 3D object detection. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00466"},{"key":"ref_186","doi-asserted-by":"crossref","unstructured":"Song, S., and Xiao, J. (2016, January 27\u201330). Deep sliding shapes for amodal 3D object detection in RGB-D images. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.94"},{"key":"ref_187","doi-asserted-by":"crossref","unstructured":"Deng, Z., and Latecki, L.J. (2017, January 21\u201326). Amodal detection of 3D objects: Inferring 3D bounding boxes from 2D ones in RGB-depth images. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.50"},{"key":"ref_188","doi-asserted-by":"crossref","unstructured":"Wang, Z., Zhan, W., and Tomizuka, M. (July, January 30). Fusing bird\u2019s eye view lidar point cloud and front view camera image for 3D object detection. Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, China.","DOI":"10.1109\/IVS.2018.8500387"},{"key":"ref_189","unstructured":"Yang, B., Liang, M., and Urtasun, R. (2018, January 29\u201331). Hdnet: Exploiting hd maps for 3d object detection. Proceedings of the Conference on Robot Learning, Zurich, Switzerland."},{"key":"ref_190","doi-asserted-by":"crossref","unstructured":"Sindagi, V.A., Zhou, Y., and Tuzel, O. (2019, January 20\u201324). MVX-Net: Multimodal voxelnet for 3D object detection. Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada.","DOI":"10.1109\/ICRA.2019.8794195"},{"key":"ref_191","doi-asserted-by":"crossref","unstructured":"Qi, C.R., Chen, X., Litany, O., and Guibas, L.J. (2020, January 14\u201319). Imvotenet: Boosting 3D object detection in point clouds with image votes. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Online Conference.","DOI":"10.1109\/CVPR42600.2020.00446"},{"key":"ref_192","unstructured":"Zhou, Y., Sun, P., Zhang, Y., Anguelov, D., Gao, J., Ouyang, T., Guo, J., Ngiam, J., and Vasudevan, V. (2020, January 8\u201311). End-to-end multi-view fusion for 3d object detection in lidar point clouds. Proceedings of the Conference on Robot Learning, London, UK\/Online Conference."},{"key":"ref_193","doi-asserted-by":"crossref","unstructured":"Xu, B., and Chen, Z. (2018, January 18\u201322). Multi-level fusion based 3D object detection from monocular images. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00249"},{"key":"ref_194","doi-asserted-by":"crossref","unstructured":"Chen, X., Ma, H., Wan, J., Li, B., and Xia, T. (2017, January 21\u201326). Multi-view 3D object detection network for autonomous driving. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.691"},{"key":"ref_195","doi-asserted-by":"crossref","unstructured":"Liang, M., Yang, B., Wang, S., and Urtasun, R. (2018). Deep continuous fusion for multi-sensor 3D object detection. Lecture Notes in Computer Science, Springer Science and Business Media.","DOI":"10.1007\/978-3-030-01270-0_39"},{"key":"ref_196","doi-asserted-by":"crossref","unstructured":"Lu, H., Chen, X., Zhang, G., Zhou, Q., Ma, Y., and Zhao, Y. (2019, January 12\u201317). Scanet: Spatial-channel attention network for 3D object detection. Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK.","DOI":"10.1109\/ICASSP.2019.8682746"},{"key":"ref_197","doi-asserted-by":"crossref","unstructured":"Xu, D., Anguelov, D., and Jain, A. (2018, January 18\u201323). PointFusion: Deep sensor fusion for 3D bounding box estimation. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00033"},{"key":"ref_198","doi-asserted-by":"crossref","unstructured":"Liang, M., Yang, B., Chen, Y., Hu, R., and Urtasun, R. (2019, January 16\u201320). Multi-task multi-sensor fusion for 3D object detection. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00752"},{"key":"ref_199","doi-asserted-by":"crossref","unstructured":"Huang, T., Liu, Z., Chen, X., and Bai, X. (2020). EPNet: Enhancing point features with image semantics for 3D object detection. Computer Vision\u2014ECCV, Springer.","DOI":"10.1007\/978-3-030-58555-6_3"},{"key":"ref_200","doi-asserted-by":"crossref","unstructured":"Ku, J., Mozifian, M., Lee, J., Harakeh, A., and Waslander, S.L. (2018, January 1\u20135). Joint 3D proposal generation and object detection from view aggregation. Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain.","DOI":"10.1109\/IROS.2018.8594049"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/20\/4029\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:10:58Z","timestamp":1760166658000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/20\/4029"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,9]]},"references-count":200,"journal-issue":{"issue":"20","published-online":{"date-parts":[[2021,10]]}},"alternative-id":["rs13204029"],"URL":"https:\/\/doi.org\/10.3390\/rs13204029","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,10,9]]}}}