{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T03:14:34Z","timestamp":1774322074192,"version":"3.50.1"},"reference-count":91,"publisher":"Springer Science and Business Media LLC","issue":"7","license":[{"start":{"date-parts":[[2024,2,6]],"date-time":"2024-02-06T00:00:00Z","timestamp":1707177600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,2,6]],"date-time":"2024-02-06T00:00:00Z","timestamp":1707177600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Academic of Finland","award":["327911, 353138"],"award-info":[{"award-number":["327911, 353138"]}]},{"name":"Junior Star GACR","award":["GM 21-28830M"],"award-info":[{"award-number":["GM 21-28830M"]}]},{"name":"Programme Johannes Amos Comenius","award":["CZ.02.01.01\/00\/22_010\/0003405"],"award-info":[{"award-number":["CZ.02.01.01\/00\/22_010\/0003405"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Comput Vis"],"published-print":{"date-parts":[[2024,7]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Visual localization is critical to many applications in computer vision and robotics. To address single-image RGB localization, state-of-the-art feature-based methods match local descriptors between a query image and a pre-built 3D model. Recently, deep neural networks have been exploited to regress the mapping between raw pixels and 3D coordinates in the scene, and thus the matching is implicitly performed by the forward pass through the network. However, in a large and ambiguous environment, learning such a regression task directly can be difficult for a single network. In this work, we present a new hierarchical scene coordinate network to predict pixel scene coordinates in a coarse-to-fine manner from a single RGB image. The proposed method, which is an extension of HSCNet, allows us to train compact models which scale robustly to large environments. It sets a new state-of-the-art for single-image localization on the 7-Scenes, 12-Scenes, Cambridge Landmarks datasets, and the combined indoor scenes.<\/jats:p>","DOI":"10.1007\/s11263-023-01982-9","type":"journal-article","created":{"date-parts":[[2024,2,6]],"date-time":"2024-02-06T18:09:15Z","timestamp":1707242955000},"page":"2530-2550","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":17,"title":["HSCNet++: Hierarchical Scene Coordinate Classification and Regression for Visual Localization with Transformer"],"prefix":"10.1007","volume":"132","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1281-4370","authenticated-orcid":false,"given":"Shuzhe","family":"Wang","sequence":"first","affiliation":[]},{"given":"Zakaria","family":"Laskar","sequence":"additional","affiliation":[]},{"given":"Iaroslav","family":"Melekhov","sequence":"additional","affiliation":[]},{"given":"Xiaotian","family":"Li","sequence":"additional","affiliation":[]},{"given":"Yi","family":"Zhao","sequence":"additional","affiliation":[]},{"given":"Giorgos","family":"Tolias","sequence":"additional","affiliation":[]},{"given":"Juho","family":"Kannala","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,2,6]]},"reference":[{"key":"1982_CR1","doi-asserted-by":"crossref","unstructured":"Arandjelovi\u0107, R., Gronat, P., Torii, A., Pajdla, T. & Sivic, J. (2016). NetVLAD: CNN architecture for weakly supervised place recognition. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR) (pp. 5297\u20135307).","DOI":"10.1109\/CVPR.2016.572"},{"key":"1982_CR2","doi-asserted-by":"crossref","unstructured":"Balntas, V., Li, S. & Prisacariu, V. (2018). RelocNet: Continuous metric learning relocalisation using neural nets. In Proceedings of the European conference on computer vision (ECCV) (pp. 751\u2013767). Springer International Publishing.","DOI":"10.1007\/978-3-030-01264-9_46"},{"key":"1982_CR3","doi-asserted-by":"crossref","unstructured":"Balntas, V., Riba, E., Ponsa, D. & Mikolajczyk, K. (2016). Learning local feature descriptors with triplets and shallow convolutional neural networks. In Proceedings of the British machine vision conference (BMVC)","DOI":"10.5244\/C.30.119"},{"key":"1982_CR4","doi-asserted-by":"crossref","unstructured":"Bay, H., Tuytelaars, T. & Van\u00a0Gool, L. (2006). SURF: Speeded up robust features. In Proceedings of the European conference on computer vision (ECCV) (pp. 404\u2013417). Springer International Publishing.","DOI":"10.1007\/11744023_32"},{"key":"1982_CR5","doi-asserted-by":"crossref","unstructured":"Brachmann, E., Humenberger, M., Rother, C. & Sattler, T. (2021). On the limits of pseudo ground truth in visual camera re-localisation. In Proceedings of the IEEE\/CVF international conference on computer vision (ICCV) (pp. 6218\u20136228).","DOI":"10.1109\/ICCV48922.2021.00616"},{"key":"1982_CR6","doi-asserted-by":"crossref","unstructured":"Brachmann, E., Krull, A., Nowozin, S., Shotton, J., Michel, F., Gumhold, S. & Rother, C. (2017). DSAC - Differentiable RANSAC for camera localization. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR) (pp. 6684\u20136692).","DOI":"10.1109\/CVPR.2017.267"},{"key":"1982_CR7","doi-asserted-by":"crossref","unstructured":"Brachmann, E., Michel, F., Krull, A., Yang, M.Y., Gumhold, S. & Rother, C. (2016). Uncertainty-driven 6D pose estimation of objects and scenes from a single RGB image. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3364\u20133372).","DOI":"10.1109\/CVPR.2016.366"},{"key":"1982_CR8","doi-asserted-by":"crossref","unstructured":"Brachmann, E. & Rother, C. (2018). Learning less is more - 6D camera localization via 3D surface regression. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR) (pp. 4654\u20134662)","DOI":"10.1109\/CVPR.2018.00489"},{"key":"1982_CR9","doi-asserted-by":"crossref","unstructured":"Brachmann, E. & Rother, C. (2019). Expert sample consensus applied to camera re-localization. In Proceedings of the IEEE\/CVF international conference on computer vision (ICCV) (pp. 7524\u20137533)","DOI":"10.1109\/ICCV.2019.00762"},{"key":"1982_CR10","doi-asserted-by":"crossref","unstructured":"Brachmann, E. & Rother, C. (2019). Expert sample consensus applied to camera re-localization. In Proceedings of the IEEE\/CVF international conference on computer vision (ICCV) (pp. 7525\u20137534).","DOI":"10.1109\/ICCV.2019.00762"},{"key":"1982_CR11","doi-asserted-by":"crossref","unstructured":"Brachmann, E. & Rother, C. (2019). Neural-guided RANSAC: Learning where to sample model hypotheses. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV) (pp. 4322\u20134331).","DOI":"10.1109\/ICCV.2019.00442"},{"issue":"9","key":"1982_CR12","first-page":"5847","volume":"44","author":"E Brachmann","year":"2021","unstructured":"Brachmann, E., & Rother, C. (2021). Visual camera re-localization from RGB and RGB-D images using DSAC. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9), 5847\u20135865.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"1982_CR13","doi-asserted-by":"crossref","unstructured":"Brahmbhatt, S., Gu, J., Kim, K., Hays, J. & Kautz, J. (2018). Geometry-aware learning of maps for camera localization. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR) (pp. 2616\u20132625).","DOI":"10.1109\/CVPR.2018.00277"},{"key":"1982_CR14","unstructured":"Budvytis, I., Teichmann, M., Vojir, T. & Cipolla, R. (2019). Large scale joint semantic re-localisation and scene understanding via globally unique instance coordinate regression. In Proceedings of the British machine vision conference (BMVC)"},{"key":"1982_CR15","unstructured":"Bui, M., Albarqouni, S., Ilic, S. & Navab, N. (2018). Scene coordinate and correspondence learning for image-based localization. In Proceedings of the British machine vision conference (BMVC)"},{"key":"1982_CR16","doi-asserted-by":"crossref","unstructured":"Calonder, M., Lepetit, V., Strecha, C. & Fua, P. (2010). BRIEF: Binary robust independent elementary features. In Proceedings of the European conference on computer vision (ECCV) (pp. 778\u2013792). Springer Berlin Heidelberg","DOI":"10.1007\/978-3-642-15561-1_56"},{"key":"1982_CR17","doi-asserted-by":"crossref","unstructured":"Cavallari, T., Bertinetto, L., Mukhoti, J., Torr, P. & Golodetz, S. (2019). Let\u2019s take this online: Adapting scene coordinate regression network predictions for online RGB-D camera relocalisation. In: International conference on 3D vision (3DV) (pp. 564\u2013573).","DOI":"10.1109\/3DV.2019.00068"},{"issue":"10","key":"1982_CR18","doi-asserted-by":"publisher","first-page":"2465","DOI":"10.1109\/TPAMI.2019.2915068","volume":"42","author":"T Cavallari","year":"2020","unstructured":"Cavallari, T., Golodetz, S., Lord, N., Valentin, J., Prisacariu, V., Di Stefano, L., & Torr, P. H. (2020). Real-time RGB-D camera pose estimation in novel scenes using a relocalisation cascade. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(10), 2465\u20132477.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"1982_CR19","doi-asserted-by":"crossref","unstructured":"Cavallari, T., Golodetz, S., Lord, N.A., Valentin, J., Di\u00a0Stefano, L. & Torr, P.H. (2017). On-the-fly adaptation of regression forests for online camera relocalisation. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR) (pp. 4457\u20134466).","DOI":"10.1109\/CVPR.2017.31"},{"key":"1982_CR20","doi-asserted-by":"crossref","unstructured":"Chen, S., Li, X., Wang, Z. & Prisacariu, V. (2022). Dfnet: Enhance absolute pose regression with direct feature matching. In Proceedings of the European conference on computer vision (ECCV) (pp. 1\u201317). Springer Nature Switzerland.","DOI":"10.1007\/978-3-031-20080-9_1"},{"key":"1982_CR21","doi-asserted-by":"crossref","unstructured":"Chen, S., Wang, Z. & Prisacariu, V. (2021). Direct-posenet: Absolute pose regression with photometric consistency. In International conference on 3D vision (3DV) (pp. 1175\u20131185).","DOI":"10.1109\/3DV53792.2021.00125"},{"key":"1982_CR22","doi-asserted-by":"crossref","unstructured":"DeTone, D., Malisiewicz, T. & Rabinovich, A. (2018). Superpoint: Self-supervised interest point detection and description. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 224\u2013236).","DOI":"10.1109\/CVPRW.2018.00060"},{"key":"1982_CR23","doi-asserted-by":"crossref","unstructured":"Ding, M., Wang, Z., Sun, J., Shi, J. & Luo, P. (2019). CamNet: Coarse-to-fine retrieval for camera re-localization. In Proceedings of the IEEE\/CVF international conference on computer vision (ICCV) (pp. 2871\u20132880).","DOI":"10.1109\/ICCV.2019.00296"},{"key":"1982_CR24","doi-asserted-by":"crossref","unstructured":"Dusmanu, M., Rocco, I., Pajdla, T., Pollefeys, M., Sivic, J., Torii, A. & Sattler, T. (2019). D2-Net: A trainable CNN for joint detection and description of local features. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR) (pp. 8092\u20138101).","DOI":"10.1109\/CVPR.2019.00828"},{"issue":"6","key":"1982_CR25","doi-asserted-by":"publisher","first-page":"381","DOI":"10.1145\/358669.358692","volume":"24","author":"MA Fischler","year":"1981","unstructured":"Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 381\u2013395.","journal-title":"Communications of the ACM"},{"issue":"3","key":"1982_CR26","doi-asserted-by":"publisher","first-page":"5737","DOI":"10.1109\/LRA.2021.3082473","volume":"6","author":"P Guan","year":"2021","unstructured":"Guan, P., Cao, Z., Yu, J., Zhou, C., & Tan, M. (2021). Scene coordinate regression network with global context-guided spatial feature transformation for visual relocalization. IEEE Robotics and Automation Letters, 6(3), 5737\u20135744.","journal-title":"IEEE Robotics and Automation Letters"},{"key":"1982_CR27","doi-asserted-by":"crossref","unstructured":"Guzm\u00e1n-Rivera, A., Kohli, P., Glocker, B., Shotton, J., Sharp, T., Fitzgibbon, A.W. & Izadi, S. (2014). Multi-output learning for camera relocalization. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR) (pp. 1114\u20131121).","DOI":"10.1109\/CVPR.2014.146"},{"key":"1982_CR28","unstructured":"Han, X., Leung, T., Jia, Y., Sukthankar, R. & Berg, A.C. (2015). Matchnet: Unifying feature and metric learning for patch-based matching. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR) (pp. 3279\u20133286)."},{"key":"1982_CR29","doi-asserted-by":"crossref","unstructured":"Huang, Z., Zhou, H., Li, Y., Yang, B., Xu, Y., Zhou, X., Bao, H., Zhang, G. & Li, H. (2021). VS-Net: Voting with segmentation for visual localization. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR) (pp. 6101\u20136111).","DOI":"10.1109\/CVPR46437.2021.00604"},{"key":"1982_CR30","doi-asserted-by":"crossref","unstructured":"Jiang, W., Trulls, E., Hosang, J., Tagliasacchi, A. & Yi, K.M. (2021). COTR: Correspondence transformer for matching across images. In Proceedings of the IEEE\/CVF international conference on computer vision (ICCV) (pp. 6207\u20136217).","DOI":"10.1109\/ICCV48922.2021.00615"},{"key":"1982_CR31","unstructured":"Katharopoulos, A., Vyas, A., Pappas, N. & Fleuret, F. (2020). Transformers are rnns: Fast autoregressive transformers with linear attention. In Proceedings of the 37th international conference on machine learning (ICML) (pp. 5156\u20135165). JMLR"},{"key":"1982_CR32","doi-asserted-by":"crossref","unstructured":"Kendall, A. & Cipolla, R. (2016). Modelling uncertainty in deep learning for camera relocalization. In Proceedings of the IEEE international conference on robotics and automation (ICRA) (pp. 4762\u20134769).","DOI":"10.1109\/ICRA.2016.7487679"},{"key":"1982_CR33","doi-asserted-by":"crossref","unstructured":"Kendall, A., Cipolla, R. (2017). Geometric loss functions for camera pose regression with deep learning. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR) (pp. 5974\u20135983).","DOI":"10.1109\/CVPR.2017.694"},{"key":"1982_CR34","unstructured":"Kendall, A., Gal, Y. & Cipolla, R. (2018). Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR) (pp 7482\u20137491)."},{"key":"1982_CR35","doi-asserted-by":"crossref","unstructured":"Kendall, A., Grimes, M. & Cipolla, R. (2015). PoseNet: A convolutional network for real-time 6-DoF camera relocalization. In: Proceedings of the IEEE\/CVF international conference on computer vision (ICCV) (pp 2938\u20132946).","DOI":"10.1109\/ICCV.2015.336"},{"key":"1982_CR36","unstructured":"Kingma, D.P. & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv:1412.6980"},{"key":"1982_CR37","doi-asserted-by":"crossref","unstructured":"Laskar, Z., Melekhov, I., Kalia, S. & Kannala, J. (2017). Camera relocalization by computing pairwise relative poses using convolutional neural network. In Proceedings of the IEEE\/CVF international conference on computer vision (ICCV) workshops (pp. 929\u2013938).","DOI":"10.1109\/ICCVW.2017.113"},{"key":"1982_CR38","doi-asserted-by":"crossref","unstructured":"Li, X., Wang, S., Zhao, Y., Verbeek, J. & Kannala, J. (2020). Hierarchical scene coordinate classification and regression for visual localization. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR) (pp. 11,983\u201311,992).","DOI":"10.1109\/CVPR42600.2020.01200"},{"key":"1982_CR39","doi-asserted-by":"crossref","unstructured":"Li, X., Ylioinas, J. & Kannala, J. (2018). Full-frame scene coordinate regression for image-based localization. In Proceedings of robotics: science and systems (RSS)","DOI":"10.15607\/RSS.2018.XIV.015"},{"key":"1982_CR40","doi-asserted-by":"crossref","unstructured":"Li, X., Ylioinas, J., Verbeek, J. & Kannala, J. (2018). Scene coordinate regression with angle-based reprojection loss for camera relocalization. In Proceedings of the European conference on computer vision (ECCV) workshops (pp 229\u2013245). Springer International Publishing.","DOI":"10.1007\/978-3-030-11015-4_19"},{"key":"1982_CR41","doi-asserted-by":"crossref","unstructured":"Li, X., Ylioinas, J., Verbeek, J. & Kannala, J. (2018). Scene coordinate regression with angle-based reprojection loss for camera relocalization. In Proceedings of the European conference on computer vision (ECCV) workshops (pp. 0\u20130).","DOI":"10.1007\/978-3-030-11015-4_19"},{"issue":"2","key":"1982_CR42","doi-asserted-by":"publisher","first-page":"91","DOI":"10.1023\/B:VISI.0000029664.99615.94","volume":"60","author":"DG Lowe","year":"2004","unstructured":"Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91\u2013110.","journal-title":"International Journal of Computer Vision"},{"key":"1982_CR43","doi-asserted-by":"crossref","unstructured":"Luo, Z., Shen, T., Zhou, L., Zhang, J., Yao, Y., Li, S., Fang, T. & Quan, L. (2019). Contextdesc: Local descriptor augmentation with cross-modality context. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR) (pp 2527\u20132536).","DOI":"10.1109\/CVPR.2019.00263"},{"key":"1982_CR44","doi-asserted-by":"crossref","unstructured":"Massiceti, D., Krull, A., Brachmann, E., Rother, C., & Torr, P.H. (2017). Random forests versus neural networks\u2014What\u2019s best for camera localization? In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (pp. 5118\u20135125).","DOI":"10.1109\/ICRA.2017.7989598"},{"key":"1982_CR45","unstructured":"Melekhov, I., Brostow, G.J., Kannala, J. & Turmukhambetov, D. (2020). Image stylization for robust features. ArXiv preprint arXiv:2008.06959."},{"key":"1982_CR46","doi-asserted-by":"crossref","unstructured":"Melekhov, I., Kannala, J. & Rahtu, E. (2017). Image patch matching using convolutional descriptors with euclidean distance. In Proceedings of the Asian conference on computer vision (ACCV) workshops (pp. 638\u2013653). Springer.","DOI":"10.1007\/978-3-319-54526-4_46"},{"key":"1982_CR47","doi-asserted-by":"crossref","unstructured":"Melekhov, I., Laskar, Z., Li, X., Wang, S. & Juho, K. (2021). Digging into self-supervised learning of feature descriptors. In: International conference on 3D vision (3DV)( pp. 1144\u20131155).","DOI":"10.1109\/3DV53792.2021.00122"},{"key":"1982_CR48","doi-asserted-by":"crossref","unstructured":"Melekhov, I., Ylioinas, J., Kannala, J. & Rahtu, E. (2017). Image-based localization using hourglass networks. In: Proceedings of the IEEE\/CVF international conference on computer vision (ICCV) Workshops (pp. 879\u2013886).","DOI":"10.1109\/ICCVW.2017.107"},{"key":"1982_CR49","doi-asserted-by":"crossref","unstructured":"Meng, L., Chen, J., Tung, F., Little, J.J., Valentin, J. & de\u00a0Silva, C.W. (2017). Backtracking regression forests for accurate camera relocalization. In Proceedings of the IEEE\/RSJ international conference on intelligent robots and systems (IROS) (pp. 6886\u20136893).","DOI":"10.1109\/IROS.2017.8206611"},{"key":"1982_CR50","doi-asserted-by":"crossref","unstructured":"Meng, L., Tung, F., Little, J.J., Valentin, J. & de\u00a0Silva, C.W. (2018). Exploiting points and lines in regression forests for RGB-D camera relocalization. In Proceedings of the IEEE\/RSJ international conference on intelligent robots and systems (IROS) (pp. 6827\u20136834).","DOI":"10.1109\/IROS.2018.8593505"},{"key":"1982_CR51","unstructured":"Mishchuk, A., Mishkin, D., Radenovic, F. & Matas, J. (2017). Working hard to know your neighbor\u2019s margins: Local descriptor learning loss. In Advances in Neural Information Processing Systems (NIPS)( vol.\u00a030, pp. 4826\u20134837). Curran Associates, Inc."},{"key":"1982_CR52","unstructured":"Moreau, A., Piasco, N., Tsishkou, D., Stanciulescu, B. & de\u00a0La\u00a0Fortelle, A. (2021). LENS: Localization enhanced by neRF synthesis. In Annual conference on robot learning"},{"issue":"1","key":"1982_CR53","doi-asserted-by":"publisher","first-page":"3942","DOI":"10.1609\/aaai.v32i1.11671","volume":"32","author":"E Perez","year":"2018","unstructured":"Perez, E., Strub, F., De Vries, H., Dumoulin, V., & Courville, A. (2018). Film: Visual reasoning with a general conditioning layer. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), 3942\u20133951.","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"key":"1982_CR54","doi-asserted-by":"crossref","unstructured":"Radenovi\u0107, F., Tolias, G. & Chum, O. (2016). CNN image retrieval learns from BoW: Unsupervised fine-tuning with hard examples. In Proceedings of the European conference on computer vision (ECCV) (pp. 3\u201320). Springer International Publishing.","DOI":"10.1007\/978-3-319-46448-0_1"},{"issue":"4","key":"1982_CR55","doi-asserted-by":"publisher","first-page":"4407","DOI":"10.1109\/LRA.2018.2869640","volume":"3","author":"N Radwan","year":"2018","unstructured":"Radwan, N., Valada, A., & Burgard, W. (2018). VLocNet++: Deep multitask learning for semantic visual localization and odometry. IEEE Robotics and Automation Letters, 3(4), 4407\u20134414.","journal-title":"IEEE Robotics and Automation Letters"},{"key":"1982_CR56","unstructured":"Revaud, J., De\u00a0Souza, C., Humenberger, M. & Weinzaepfel, P. (2019). R2D2: Reliable and repeatable detector and descriptor. In: Advances in neural information processing systems (NeurIPS) (Vol.\u00a032, pp. 12,405\u201312,415). Curran Associates, Inc."},{"key":"1982_CR57","doi-asserted-by":"crossref","unstructured":"Rogez, G., Weinzaepfel, P. & Schmid, C. (2017). LCR-Net: Localization-classification-regression for human pose. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR) (pp. 3433\u20133441).","DOI":"10.1109\/CVPR.2017.134"},{"issue":"5","key":"1982_CR58","first-page":"1146","volume":"42","author":"G Rogez","year":"2019","unstructured":"Rogez, G., Weinzaepfel, P., & Schmid, C. (2019). LCR-Net++: Multi-person 2D and 3D pose detection in natural images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(5), 1146\u20131161.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"1982_CR59","doi-asserted-by":"crossref","unstructured":"Rublee, E., Rabaud, V., Konolige, K. & Bradski, G.R. (2011). ORB: An efficient alternative to SIFT or SURF. In: Proceedings of the IEEE\/CVF international conference on computer vision (ICCV) (pp. 2564\u20132571).","DOI":"10.1109\/ICCV.2011.6126544"},{"key":"1982_CR60","unstructured":"Saha, S., Varma, G. & Jawahar, C. (2018). Improved visual relocalization by discovering anchor points. In Proceedings of the British machine vision conference (BMVC)"},{"key":"1982_CR61","doi-asserted-by":"crossref","unstructured":"Sarlin, P.E., Cadena, C., Siegwart, R. & Dymczyk, M. (2019). From coarse to fine: Robust hierarchical localization at large scale. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR) (pp 12,716\u201312,725).","DOI":"10.1109\/CVPR.2019.01300"},{"key":"1982_CR62","doi-asserted-by":"crossref","unstructured":"Sarlin, P.E., DeTone, D., Malisiewicz, T. & Rabinovich, A. (2020). Superglue: Learning feature matching with graph neural networks. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR) (pp 4938\u20134947).","DOI":"10.1109\/CVPR42600.2020.00499"},{"key":"1982_CR63","doi-asserted-by":"crossref","unstructured":"Sarlin, P.E., Unagar, A., Larsson, M., Germain, H., Toft, C., Larsson, V., Pollefeys, M., Lepetit, V., Hammarstrand, L., Kahl, F., & Sattler, T. (2021). Back to the feature: Learning robust camera localization from pixels to pose. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR) (pp. 3247\u20133257).","DOI":"10.1109\/CVPR46437.2021.00326"},{"key":"1982_CR64","doi-asserted-by":"crossref","unstructured":"Sattler, T., Leibe, B., & Kobbelt, L. (2011). Fast image-based localization using direct 2d-to-3d matching. In Proceedings of the IEEE\/CVF international conference on computer vision (ICCV) (pp. 667\u2013674).","DOI":"10.1109\/ICCV.2011.6126302"},{"key":"1982_CR65","doi-asserted-by":"crossref","unstructured":"Sattler, T., Leibe, B., & Kobbelt, L. (2012). Improving image-based localization by active correspondence search. In Proceedings of the European Conference on computer vision (ECCV) (pp. 752\u2013765). Springer International Publishing.","DOI":"10.1007\/978-3-642-33718-5_54"},{"issue":"9","key":"1982_CR66","doi-asserted-by":"publisher","first-page":"1744","DOI":"10.1109\/TPAMI.2016.2611662","volume":"39","author":"T Sattler","year":"2016","unstructured":"Sattler, T., Leibe, B., & Kobbelt, L. (2016). Efficient & effective prioritized matching for large-scale image-based localization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(9), 1744\u20131756.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"issue":"9","key":"1982_CR67","doi-asserted-by":"publisher","first-page":"1744","DOI":"10.1109\/TPAMI.2016.2611662","volume":"39","author":"T Sattler","year":"2016","unstructured":"Sattler, T., Leibe, B., & Kobbelt, L. (2016). Efficient & effective prioritized matching for large-scale image-based localization. IEEE Transactions on Pattern Analysis And Machine Intelligence, 39(9), 1744\u20131756.","journal-title":"IEEE Transactions on Pattern Analysis And Machine Intelligence"},{"key":"1982_CR68","doi-asserted-by":"crossref","unstructured":"Sattler, T., Maddern, W., Toft, C., Torii, A., Hammarstrand, L., Stenborg, E., Safari, D., Okutomi, M., Pollefeys, M., Sivic, J., Kahl, F., Pajdla, T. (2018). Benchmarking 6DoF outdoor visual localization in changing conditions. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR) (pp. 8601\u20138610).","DOI":"10.1109\/CVPR.2018.00897"},{"key":"1982_CR69","doi-asserted-by":"crossref","unstructured":"Sattler, T., Zhou, Q., Pollefeys, M. & Leal-Taixe, L. (2019). Understanding the limitations of CNN-based absolute camera pose regression. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR) (pp. 3302\u20133312).","DOI":"10.1109\/CVPR.2019.00342"},{"key":"1982_CR70","doi-asserted-by":"crossref","unstructured":"Sch\u00f6nberger, J.L., Zheng, E., Pollefeys, M. & Frahm, J.M. (2016). Pixelwise view selection for unstructured multi-view stereo. In Proceedings of the European conference on computer vision (ECCV)","DOI":"10.1007\/978-3-319-46487-9_31"},{"key":"1982_CR71","doi-asserted-by":"crossref","unstructured":"Shavit, Y., Ferens, R. & Keller, Y. (2021). Learning multi-scene absolute pose regression with transformers. In Proceedings of the IEEE\/CVF international conference on computer vision (ICCV) (pp 2733\u20132742).","DOI":"10.1109\/ICCV48922.2021.00273"},{"key":"1982_CR72","doi-asserted-by":"crossref","unstructured":"Shavit, Y. & Keller, Y. (2022). Camera pose auto-encoders for improving pose regression. In Proceedings of the European conference on computer vision (ECCV) (pp. 140\u2013157). Springer International Publishing","DOI":"10.1007\/978-3-031-20080-9_9"},{"key":"1982_CR73","doi-asserted-by":"crossref","unstructured":"Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., & Fitzgibbon, A. (2013). Scene coordinate regression forests for camera relocalization in RGB-D images. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR) (pp. 2930\u20132937).","DOI":"10.1109\/CVPR.2013.377"},{"key":"1982_CR74","doi-asserted-by":"crossref","unstructured":"Simo-Serra, E., Trulls, E., Ferraz, L., Kokkinos, I., Fua, P., & Moreno-Noguer, F. (2015). Discriminative learning of deep convolutional feature point descriptors. In Proceedings of the IEEE\/CVF international conference on computer vision (ICCV) (pp. 118\u2013126).","DOI":"10.1109\/ICCV.2015.22"},{"key":"1982_CR75","doi-asserted-by":"crossref","unstructured":"Sun, J., Shen, Z., Wang, Y., Bao, H. & Xiaowei, Z. (2021). LoFTR: Detector-free local feature matching with transformers. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR) (pp. 8922\u20138931).","DOI":"10.1109\/CVPR46437.2021.00881"},{"key":"1982_CR76","doi-asserted-by":"crossref","unstructured":"Taira, H., Okutomi, M., Sattler, T., Cimpoi, M., Pollefeys, M., Sivic, J., Pajdla, T., & Torii, A. (2018). Inloc: Indoor visual localization with dense matching and view synthesis. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR) (pp. 7199\u20137209).","DOI":"10.1109\/CVPR.2018.00752"},{"key":"1982_CR77","doi-asserted-by":"crossref","unstructured":"Tian, Y., Fan, B. & Wu, F. (2017). L2-net: Deep learning of discriminative patch descriptor in euclidean space. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR) (pp. 661\u2013669).","DOI":"10.1109\/CVPR.2017.649"},{"key":"1982_CR78","unstructured":"Tyszkiewicz, M., Fua, P., & Trulls, E. (2020). DISK: Learning local features with policy. In Advances in neural information processing systems (NeurIPS) (Vol.\u00a033, pp. 14,254\u201314,265). Curran Associates, Inc."},{"key":"1982_CR79","doi-asserted-by":"crossref","unstructured":"Valada, A., Radwan, N. & Burgard, W. (2018). Deep auxiliary learning for visual localization and odometry. In Proceedings of the IEEE international conference on robotics and automation (ICRA) (pp. 6939\u20136946).","DOI":"10.1109\/ICRA.2018.8462979"},{"key":"1982_CR80","doi-asserted-by":"crossref","unstructured":"Valentin, J., Dai, A., Nie\u00dfner, M., Kohli, P., Torr, P., Izadi, S., & Keskin, C. (2016). Learning to navigate the energy landscape. In International conference on 3D vision (3DV) (pp. 323\u2013332).","DOI":"10.1109\/3DV.2016.41"},{"key":"1982_CR81","doi-asserted-by":"crossref","unstructured":"Valentin, J., Nie\u00dfner, M., Shotton, J., Fitzgibbon, A., Izadi, S., & Torr, P.H. (2015). Exploiting uncertainty in regression forests for accurate camera relocalization. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR) (pp. 4400\u20134408).","DOI":"10.1109\/CVPR.2015.7299069"},{"key":"1982_CR82","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (NeurIPS) (Vol. 30, pp. 5998\u20136008). Curran Associates, Inc."},{"key":"1982_CR83","doi-asserted-by":"crossref","unstructured":"Walch, F., Hazirbas, C., Leal-Taixe, L., Sattler, T., Hilsenbeck, S., & Cremers, D. (2017). Image-based localization using LSTMs for structured feature correlation. In Proceedings of the IEEE\/CVF international conference on computer vision (ICCV) (pp. 627\u2013637).","DOI":"10.1109\/ICCV.2017.75"},{"key":"1982_CR84","doi-asserted-by":"crossref","unstructured":"Wang, Q., Zhou, X., Hariharan, B., & Snavely, N. (2020). Learning feature descriptors using camera pose supervision. In Proceedings of the European conference on computer vision (ECCV) (pp. 757\u2013774). Springer International Publishing","DOI":"10.1007\/978-3-030-58452-8_44"},{"key":"1982_CR85","doi-asserted-by":"crossref","unstructured":"Wang, S., Laskar, Z., Melekhov, I., Li, X., & Kannala, J. (2021). Continual learning for image-based camera localization. In Proceedings of the IEEE\/CVF international conference on computer vision (ICCV) (pp. 3252\u20133262).","DOI":"10.1109\/ICCV48922.2021.00324"},{"key":"1982_CR86","doi-asserted-by":"crossref","unstructured":"Wang, Y., Ma, X., Chen, Z., Luo, Y., Yi, J., & Bailey, J. (2019). Symmetric cross entropy for robust learning with noisy labels. In Proceedings of the IEEE\/CVF international conference on computer vision (ICCV) (pp. 322\u2013330).","DOI":"10.1109\/ICCV.2019.00041"},{"key":"1982_CR87","doi-asserted-by":"crossref","unstructured":"Weinzaepfel, P., Csurka, G., Cabon, Y., & Humenberger, M. (2019). Visual localization by learning objects-of-interest dense match regression. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR) (pp. 5634\u20135643).","DOI":"10.1109\/CVPR.2019.00578"},{"key":"1982_CR88","doi-asserted-by":"crossref","unstructured":"Xue, F., Wang, X., Yan, Z., Wang, Q., Wang, J., & Zha, H. (2019). Local supports global: Deep camera relocalization with sequence enhancement. In Proceedings of the IEEE\/CVF international conference on computer vision (ICCV) (pp. 2841\u20132850).","DOI":"10.1109\/ICCV.2019.00293"},{"key":"1982_CR89","doi-asserted-by":"crossref","unstructured":"Xue, F., Wu, X., Cai, S., & Wang, J. (2020). Learning multi-view camera relocalization with graph neural networks. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR) (pp. 11,375\u201311,384).","DOI":"10.1109\/CVPR42600.2020.01139"},{"key":"1982_CR90","doi-asserted-by":"crossref","unstructured":"Zagoruyko, S., & Komodakis, N. (2015). Learning to compare image patches via convolutional neural networks. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR) (pp. 4353\u20134361).","DOI":"10.1109\/CVPR.2015.7299064"},{"key":"1982_CR91","doi-asserted-by":"crossref","unstructured":"Zhou, Q., Sattler, T., & Leal-Taix\u00e9, L. (2021). Patch2Pix: Epipolar-guided pixel-level correspondences. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR) (pp. 4669\u20134678).","DOI":"10.1109\/CVPR46437.2021.00464"}],"container-title":["International Journal of Computer Vision"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11263-023-01982-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11263-023-01982-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11263-023-01982-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,19]],"date-time":"2024-06-19T13:18:44Z","timestamp":1718803124000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11263-023-01982-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,2,6]]},"references-count":91,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2024,7]]}},"alternative-id":["1982"],"URL":"https:\/\/doi.org\/10.1007\/s11263-023-01982-9","relation":{},"ISSN":["0920-5691","1573-1405"],"issn-type":[{"value":"0920-5691","type":"print"},{"value":"1573-1405","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,2,6]]},"assertion":[{"value":"10 February 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 December 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 February 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no competing interests to declare that are relevant to the content of this article.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}