{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T01:37:14Z","timestamp":1760233034297,"version":"build-2065373602"},"reference-count":67,"publisher":"MDPI AG","issue":"24","license":[{"start":{"date-parts":[[2022,12,12]],"date-time":"2022-12-12T00:00:00Z","timestamp":1670803200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Natural Science Foundation of China","award":["62101576","62171451","61906206","62071478","2020JJ5671","2019jcjqzd00600"],"award-info":[{"award-number":["62101576","62171451","61906206","62071478","2020JJ5671","2019jcjqzd00600"]}]},{"name":"Natural Science Foundation of Hunan Province of China","award":["62101576","62171451","61906206","62071478","2020JJ5671","2019jcjqzd00600"],"award-info":[{"award-number":["62101576","62171451","61906206","62071478","2020JJ5671","2019jcjqzd00600"]}]},{"name":"Key Basic Research Project of the China Basic Strengthening Program","award":["62101576","62171451","61906206","62071478","2020JJ5671","2019jcjqzd00600"],"award-info":[{"award-number":["62101576","62171451","61906206","62071478","2020JJ5671","2019jcjqzd00600"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Image-to-point cloud registration refers to finding relative transformation between the camera and the reference frame of the 3D point cloud, which is critical for autonomous driving. Recently, a two-stage \u201cfrustum point cloud classification + camera pose optimization\u201d pipeline has shown impressive results on this task. This paper focuses on the second stage and reformulates the optimization procedure as a Markov decision process. An initial pose is modified incrementally, sequentially aligning a virtual 3D point observation towards a previous classification solution. We consider such an iterative update process as a reinforcement learning task and, to this end, propose a novel agent (AgentI2P) to conduct decision making. To guide AgentI2P, we employ behaviour cloning (BC) and reinforcement learning (RL) techniques: cloning an expert to learn accurate pose movement and reinforcing an alignment reward to improve the policy further. [We demonstrate the effectiveness and efficiency of our approach on Oxford Robotcar and KITTI datasets. The (RTE, RRE) metrics are (1.34m,1.46\u2218) on Oxford Robotcar and (3.90m,5.94\u2218) on KITTI, and the inference time is 60 ms, both achieving state-of-the-art performance]. The source code will be publicly available upon publication of the paper.<\/jats:p>","DOI":"10.3390\/rs14246301","type":"journal-article","created":{"date-parts":[[2022,12,13]],"date-time":"2022-12-13T03:32:32Z","timestamp":1670902352000},"page":"6301","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["AgentI2P: Optimizing Image-to-Point Cloud Registration via Behaviour Cloning and Reinforcement Learning"],"prefix":"10.3390","volume":"14","author":[{"given":"Shen","family":"Yan","sequence":"first","affiliation":[{"name":"Department of System Engineering, National University of Defense Technology, Deya Road 107, Changsha 410000, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6748-0545","authenticated-orcid":false,"given":"Maojun","family":"Zhang","sequence":"additional","affiliation":[{"name":"Department of System Engineering, National University of Defense Technology, Deya Road 107, Changsha 410000, China"}]},{"given":"Yang","family":"Peng","sequence":"additional","affiliation":[{"name":"Department of System Engineering, National University of Defense Technology, Deya Road 107, Changsha 410000, China"}]},{"given":"Yu","family":"Liu","sequence":"additional","affiliation":[{"name":"Department of System Engineering, National University of Defense Technology, Deya Road 107, Changsha 410000, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8470-263X","authenticated-orcid":false,"given":"Hanlin","family":"Tan","sequence":"additional","affiliation":[{"name":"Department of System Engineering, National University of Defense Technology, Deya Road 107, Changsha 410000, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,12,12]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"2074","DOI":"10.1109\/TPAMI.2020.3032010","article-title":"Long-term visual localization revisited","volume":"44","author":"Toft","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Fan, B., Yang, Y., Feng, W., Wu, F., Lu, J., and Liu, H. (2022). Seeing through Darkness: Visual Localization at Night via Weakly Supervised Learning of Domain Invariant Features. IEEE Trans. Multimed.","DOI":"10.1109\/TMM.2022.3154165"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"476","DOI":"10.1177\/0278364914561101","article-title":"Real-time monocular image-based 6-DoF localization","volume":"34","author":"Lim","year":"2015","journal-title":"Int. J. Robot. Res."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"2820","DOI":"10.1109\/TMM.2020.3017886","article-title":"An Accurate, Robust Visual Odometry and Detail-Preserving Reconstruction System","volume":"23","author":"Gong","year":"2020","journal-title":"IEEE Trans. Multimed."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1253","DOI":"10.1109\/TCSVT.2019.2941040","article-title":"EgoCart: A benchmark dataset for large-scale indoor image-based localization in retail stores","volume":"31","author":"Spera","year":"2019","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Heng, L., Choi, B., Cui, Z., Geppert, M., Hu, S., Kuan, B., Liu, P., Nguyen, R., Yeo, Y.C., and Geiger, A. (2019, January 20\u201324). Project autovision: Localization and 3d scene perception for an autonomous vehicle with a multi-camera system. Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada.","DOI":"10.1109\/ICRA.2019.8793949"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"4804","DOI":"10.1109\/TCSVT.2021.3121987","article-title":"Uav-satellite view synthesis for cross-view geo-localization","volume":"32","author":"Tian","year":"2021","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"867","DOI":"10.1109\/TCSVT.2021.3061265","article-title":"Each part matters: Local patterns facilitate cross-view geo-localization","volume":"32","author":"Wang","year":"2021","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_9","unstructured":"Mann, S., Furness, T., Yuan, Y., Iorio, J., and Wang, Z. (2018). All reality: Virtual, augmented, mixed (x), mediated (x, y), and multimediated reality. arXiv."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"2827","DOI":"10.1109\/TMM.2019.2913324","article-title":"Real-time visual\u2013inertial SLAM based on adaptive keyframe selection for mobile AR applications","volume":"21","author":"Piao","year":"2019","journal-title":"IEEE Trans. Multimed."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1023\/B:VISI.0000029664.99615.94","article-title":"Distinctive image features from scale-invariant keypoints","volume":"60","author":"Lowe","year":"2004","journal-title":"Int. J. Comput. Vis."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Sarlin, P.E., Cadena, C., Siegwart, R., and Dymczyk, M. (2019, January 15\u201320). From coarse to fine: Robust hierarchical localization at large scale. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01300"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Schonberger, J.L., and Frahm, J.M. (2016, January 27\u201330). Structure-from-motion revisited. Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.445"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Li, J., and Lee, G.H. (2021, January 20\u201325). DeepI2P: Image-to-Point Cloud Registration via Deep Classification. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01570"},{"key":"ref_15","first-page":"439","article-title":"Quasi-likelihood functions, generalized linear models, and the Gauss\u2014Newton method","volume":"61","author":"Wedderburn","year":"1974","journal-title":"Biometrika"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Torabi, F., Warnell, G., and Stone, P. (2018). Behavioral cloning from observation. arXiv.","DOI":"10.24963\/ijcai.2018\/687"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Arulkumaran, K., Deisenroth, M.P., Brundage, M., and Bharath, A.A. (2017). A brief survey of deep reinforcement learning. arXiv.","DOI":"10.1109\/MSP.2017.2743240"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Sarlin, P.E., DeTone, D., Malisiewicz, T., and Rabinovich, A. (2020, January 13\u201319). Superglue: Learning feature matching with graph neural networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00499"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Sun, J., Shen, Z., Wang, Y., Bao, H., and Zhou, X. (2021, January 20\u201325). LoFTR: Detector-free local feature matching with transformers. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00881"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"2770","DOI":"10.1109\/TMM.2020.3016122","article-title":"Deep unsupervised binary descriptor learning through locality consistency and self distinctiveness","volume":"23","author":"Fan","year":"2020","journal-title":"IEEE Trans. Multimed."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Sattler, T., Leibe, B., and Kobbelt, L. (2011, January 6\u201313). Fast image-based localization using direct 2d-to-3d matching. Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain.","DOI":"10.1109\/ICCV.2011.6126302"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1147","DOI":"10.1109\/TRO.2015.2463671","article-title":"ORB-SLAM: A versatile and accurate monocular SLAM system","volume":"31","author":"Montiel","year":"2015","journal-title":"IEEE Trans. Robot."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Balntas, V., Li, S., and Prisacariu, V. (2018, January 8\u201314). Relocnet: Continuous metric learning relocalisation using neural nets. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01264-9_46"},{"key":"ref_24","unstructured":"Ding, M., Wang, Z., Sun, J., Shi, J., and Luo, P. (November, January 27). CamNet: Coarse-to-fine retrieval for camera re-localization. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Kendall, A., and Cipolla, R. (2017, January 21\u201326). Geometric loss functions for camera pose regression with deep learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.694"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Kendall, A., Grimes, M., and Cipolla, R. (2015, January 7\u201313). Posenet: A convolutional network for real-time 6-dof camera relocalization. Proceedings of the IEEE International Conference on Computer Vision, Washington, DC, USA.","DOI":"10.1109\/ICCV.2015.336"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Laskar, Z., Melekhov, I., Kalia, S., and Kannala, J. (2017, January 22\u201329). Camera relocalization by computing pairwise relative poses using convolutional neural network. Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy.","DOI":"10.1109\/ICCVW.2017.113"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Walch, F., Hazirbas, C., Leal-Taixe, L., Sattler, T., Hilsenbeck, S., and Cremers, D. (2017, January 22\u201329). Image-based localization using lstms for structured feature correlation. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.75"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Zhou, Q., Sattler, T., Pollefeys, M., and Leal-Taixe, L. (August, January 31). To learn or not to learn: Visual localization from essential matrices. Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France.","DOI":"10.1109\/ICRA40945.2020.9196607"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Brachmann, E., Krull, A., Nowozin, S., Shotton, J., Michel, F., Gumhold, S., and Rother, C. (2017, January 21\u201326). Dsac-differentiable ransac for camera localization. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.267"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Brachmann, E., and Rother, C. (2018, January 18\u201323). Learning less is more-6d camera localization via 3d surface regression. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00489"},{"key":"ref_32","unstructured":"Brachmann, E., and Rother, C. (November, January 27). Expert sample consensus applied to camera re-localization. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_33","first-page":"5847","article-title":"Visual camera re-localization from RGB and RGB-D images using DSAC","volume":"44","author":"Brachmann","year":"2021","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Cavallari, T., Golodetz, S., Lord, N.A., Valentin, J., Di Stefano, L., and Torr, P.H. (2017, January 21\u201326). On-the-fly adaptation of regression forests for online camera relocalisation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.31"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Li, X., Wang, S., Zhao, Y., Verbeek, J., and Kannala, J. (2020, January 13\u201319). Hierarchical scene coordinate classification and regression for visual localization. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01200"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., and Fitzgibbon, A. (2013, January 23\u201328). Scene coordinate regression forests for camera relocalization in RGB-D images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.377"},{"key":"ref_37","unstructured":"Yang, L., Bai, Z., Tang, C., Li, H., Furukawa, Y., and Tan, P. (November, January 27). Sanet: Scene agnostic network for camera localization. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Feng, M., Hu, S., Ang, M.H., and Lee, G.H. (2019, January 20\u201324). 2d3d-matchnet: Learning to match keypoints across 2d image and 3d point cloud. Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada.","DOI":"10.1109\/ICRA.2019.8794415"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Wang, B., Chen, C., Cui, Z., Qin, J., Lu, C.X., Yu, Z., Zhao, P., Dong, Z., Zhu, F., and Trigoni, N. (2021). P2-Net: Joint Description and Detection of Local Features for Pixel and Point Matching. arXiv.","DOI":"10.1109\/ICCV48922.2021.01570"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Yu, H., Zhen, W., Yang, W., Zhang, J., and Scherer, S. (2020\u201324, January 24). Monocular camera localization in prior lidar maps with 2d-3d line correspondences. Proceedings of the 2020 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA.","DOI":"10.1109\/IROS45743.2020.9341690"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Cattaneo, D., Vaghi, M., Fontana, S., Ballardini, A.L., and Sorrenti, D.G. (August, January 31). Global visual localization in LiDAR-maps through shared 2D-3D embedding space. Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France.","DOI":"10.1109\/ICRA40945.2020.9196859"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Pham, Q.H., Uy, M.A., Hua, B.S., Nguyen, D.T., Roig, G., and Yeung, S.K. (2020, January 7\u201312). Lcd: Learned cross-domain descriptors for 2d-3d matching. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i07.6859"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Zhou, Y., Sun, X., Zha, Z.J., and Zeng, W. (2019, January 15\u201320). Context-reinforced semantic segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00417"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Mathe, S., Pirinen, A., and Sminchisescu, C. (2016, January 27\u201330). Reinforcement learning for visual object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.316"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Bauer, D., Patten, T., and Vincze, M. (2021, January 20\u201325). ReAgent: Point Cloud Registration using Imitation and Reinforcement Learning. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01435"},{"key":"ref_46","unstructured":"Busam, B., Jung, H.J., and Navab, N. (2020). I like to move it: 6d pose estimation as an action decision process. arXiv."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Shao, J., Jiang, Y., Wang, G., Li, Z., and Ji, X. (2020, January 13\u201319). Pfrl: Pose-free reinforcement learning for 6d pose estimation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01147"},{"key":"ref_48","unstructured":"Qi, C.R., Su, H., Mo, K., and Guibas, L.J. (2017, January 21\u201326). Pointnet: Deep learning on point sets for 3d classification and segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA."},{"key":"ref_49","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal policy optimization algorithms. arXiv."},{"key":"ref_50","unstructured":"Bojarski, M., Testa, D.D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel, L.D., Monfort, M., Muller, U., and Zhang, J. (2016). End to End Learning for Self-Driving Cars. arXiv."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1177\/0278364916679498","article-title":"1 year, 1000 km: The oxford robotcar dataset","volume":"36","author":"Maddern","year":"2017","journal-title":"Int. J. Robot. Res."},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"1231","DOI":"10.1177\/0278364913491297","article-title":"Vision meets robotics: The kitti dataset","volume":"32","author":"Geiger","year":"2013","journal-title":"Int. J. Robot. Res."},{"key":"ref_53","unstructured":"Godard, C., Mac Aodha, O., Firman, M., and Brostow, G.J. (November, January 27). Digging into self-supervised monocular depth estimation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_54","first-page":"586","article-title":"Method for registration of 3-D shapes","volume":"Volume 1611","author":"Besl","year":"1992","journal-title":"Proceedings of the Sensor Fusion IV: Control Paradigms and Data Structures"},{"key":"ref_55","unstructured":"Li, J., and Lee, G.H. (November, January 27). Usip: Unsupervised stable interest point detection from 3d point clouds. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_56","unstructured":"Thomas, H., Qi, C.R., Deschaud, J.E., Marcotegui, B., Goulette, F., and Guibas, L.J. (November, January 27). Kpconv: Flexible and deformable convolution for point clouds. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Hu, Q., Yang, B., Xie, L., Rosa, S., Guo, Y., Wang, Z., Trigoni, N., and Markham, A. (2020, January 13\u201319). Randla-net: Efficient semantic segmentation of large-scale point clouds. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01112"},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Choy, C., Gwak, J., and Savarese, S. (2019, January 15\u201320). 4d spatio-temporal convnets: Minkowski convolutional neural networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00319"},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Hu, W., Zhao, H., Jiang, L., Jia, J., and Wong, T.T. (2021, January 20\u201325). Bidirectional projection network for cross dimension scene understanding. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01414"},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"2547","DOI":"10.1109\/TNNLS.2020.3006524","article-title":"Scene segmentation with dual relation-aware attention network","volume":"32","author":"Fu","year":"2020","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Choi, S., Kim, J.T., and Choo, J. (2020, January 13\u201319). Cars can\u2019t fly up in the sky: Improving urban-scene segmentation via height-driven attention networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00939"},{"key":"ref_62","unstructured":"Yan, H., Zhang, C., and Wu, M. (2022). Lawin transformer: Improving semantic segmentation transformer with multi-scale representations via large window attention. arXiv."},{"key":"ref_63","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017). Attention is all you need. Adv. Neural Inf. Process. Syst., 30."},{"key":"ref_64","unstructured":"Katharopoulos, A., Vyas, A., Pappas, N., and Fleuret, F. (2020, January 13\u201318). Transformers are rnns: Fast autoregressive transformers with linear attention. Proceedings of the International Conference on Machine Learning, Virtual."},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. (2020, January 23\u201328). End-to-end object detection with transformers. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"ref_66","unstructured":"Zhu, X., Su, W., Lu, L., Li, B., Wang, X., and Dai, J. (2020). Deformable detr: Deformable transformers for end-to-end object detection. arXiv."},{"key":"ref_67","doi-asserted-by":"crossref","unstructured":"Qin, Z., Yu, H., Wang, C., Guo, Y., Peng, Y., and Xu, K. (2022, January 18\u201324). Geometric transformer for fast and robust point cloud registration. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01086"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/24\/6301\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:40:07Z","timestamp":1760146807000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/24\/6301"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,12]]},"references-count":67,"journal-issue":{"issue":"24","published-online":{"date-parts":[[2022,12]]}},"alternative-id":["rs14246301"],"URL":"https:\/\/doi.org\/10.3390\/rs14246301","relation":{},"ISSN":["2072-4292"],"issn-type":[{"type":"electronic","value":"2072-4292"}],"subject":[],"published":{"date-parts":[[2022,12,12]]}}}