{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T01:09:15Z","timestamp":1760231355762,"version":"build-2065373602"},"reference-count":28,"publisher":"MDPI AG","issue":"18","license":[{"start":{"date-parts":[[2022,9,15]],"date-time":"2022-09-15T00:00:00Z","timestamp":1663200000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Ministry of Science and Technology, Taiwan","award":["109-2221-E-002-207-MY3"],"award-info":[{"award-number":["109-2221-E-002-207-MY3"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Absolute pose regression (APR) for camera localization is a single-shot approach that encodes the information of a 3D scene in an end-to-end neural network. The camera pose result of APR methods can be observed as the linear combination of the base poses. Previous APR methods\u2019 base poses are learned from training data. However, the training data can limit the performance of the methods, which cannot be generalized to cover the entire scene. To solve this issue, we use handcrafted base poses instead of learning-based base poses, which prevents overfitting the camera poses of the training data. Moreover, we use a dual-stream network architecture to process color and depth images separately to get more accurate localization. On the 7 Scenes dataset, the proposed method is among the best in median rotation error, and in median translation error, it outperforms previous APR methods. On a more difficult dataset\u2014Oxford RobotCar dataset, the proposed method achieves notable improvements in median translation and rotation errors compared to the state-of-the-art APR methods.<\/jats:p>","DOI":"10.3390\/s22186971","type":"journal-article","created":{"date-parts":[[2022,9,16]],"date-time":"2022-09-16T01:35:10Z","timestamp":1663292110000},"page":"6971","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Absolute Camera Pose Regression Using an RGB-D Dual-Stream Network and Handcrafted Base Poses"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5582-1039","authenticated-orcid":false,"given":"Peng-Yuan","family":"Kao","sequence":"first","affiliation":[{"name":"Graduate Institute of Networking and Multimedia, National Taiwan University, Taipei 10617, Taiwan"}]},{"given":"Rong-Rong","family":"Zhang","sequence":"additional","affiliation":[{"name":"Department of Computer Science & Information Engineering, National Taiwan University, Taipei 10617, Taiwan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7900-890X","authenticated-orcid":false,"given":"Timothy","family":"Chen","sequence":"additional","affiliation":[{"name":"Department of Computer Science & Information Engineering, National Taiwan University, Taipei 10617, Taiwan"}]},{"given":"Yi-Ping","family":"Hung","sequence":"additional","affiliation":[{"name":"Graduate Institute of Networking and Multimedia, National Taiwan University, Taipei 10617, Taiwan"}]}],"member":"1968","published-online":{"date-parts":[[2022,9,15]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1744","DOI":"10.1109\/TPAMI.2016.2611662","article-title":"Efficient & Effective Prioritized Matching for Large-Scale Image-Based Localization","volume":"39","author":"Sattler","year":"2016","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell. (PAMI)"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Sarlin, P.E., Cadena, C., Siegwart, R., and Dymczyk, M. (2019, January 15\u201320). From Coarse to Fine: Robust Hierarchical Localization at Large Scale. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01300"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Torii, A., Arandjelovic, R., Sivic, J., Okutomi, M., and Pajdla, T. (2015, January 7\u201312). 24\/7 Place Recognition by View Synthesis. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298790"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., and Sivic, J. (2016, January 27\u201330). NetVLAD: CNN Architecture for Weakly Supervised Place Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.572"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Laskar, Z., Melekhov, I., Kalia, S., and Kannala, J. (2017, January 22\u201329). Camera Relocalization by Computing Pairwise Relative Poses Using Convolutional Neural Network. Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops, Venice, Italy.","DOI":"10.1109\/ICCVW.2017.113"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Balntas, V., Li, S., and Prisacariu, V. (2018, January 8\u201314). RelocNet: Continuous Metric Learning Relocalisation Using Neural Nets. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01264-9_46"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., and Fitzgibbon, A. (2013, January 23\u201328). Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.377"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Brachmann, E., Krull, A., Nowozin, S., Shotton, J., Michel, F., Gumhold, S., and Rother, C. (2017, January 21\u201326). DSAC\u2014Differentiable RANSAC for Camera Localization. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.267"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Brachmann, E., and Rother, C. (2018, January 18\u201323). Learning Less is More\u20146D Camera Localization via 3D Surface Regression. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00489"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Kendall, A., Grimes, M., and Cipolla, R. (2015, January 7\u201313). PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.336"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Kendall, A., and Cipolla, R. (2016, January 16\u201321). Modelling Uncertainty in Deep Learning for Camera Relocalization. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden.","DOI":"10.1109\/ICRA.2016.7487679"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Kendall, A., and Cipolla, R. (2017, January 21\u201326). Geometric Loss Functions for Camera Pose Regression with Deep Learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.694"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"651","DOI":"10.1109\/TASE.2017.2664920","article-title":"Indoor Relocalization in Challenging Environments with Dual-Stream Convolutional Neural Networks","volume":"15","author":"Li","year":"2017","journal-title":"IEEE Trans. Autom. Sci. Eng."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Brahmbhatt, S., Gu, J., Kim, K., Hays, J., and Kautz, J. (2018, January 18\u201323). Geometry-Aware Learning of Maps for Camera Localization. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00277"},{"key":"ref_15","unstructured":"Tian, M., Nie, Q., and Shen, H. (August, January 31). 3D Scene Geometry-Aware Constraint for Camera Localization with Deep Learning. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Paris, France."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Sattler, T., Zhou, Q., Pollefeys, M., and Leal-Taixe, L. (2019, January 15\u201320). Understanding the Limitations of CNN-based Absolute Camera Pose Regression. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00342"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Levin, A., Lischinski, D., and Weiss, Y. (2004). Colorization Using Optimization, ACM SIGGRAPH.","DOI":"10.1145\/1186562.1015780"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Silberman, N., Hoiem, D., Kohli, P., and Fergus, R. (2012, January 7\u201313). Indoor Segmentation and Support Inference from RGBD Images. Proceedings of the European Conference on Computer Vision (ECCV), Florence, Italy.","DOI":"10.1007\/978-3-642-33715-4_54"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R., Kohli, P., Shotton, J., Hodges, S., Freeman, D., and Davison, A. (2011, January 16\u201319). KinectFusion: Real-Time 3D Reconstruction and Interaction Using A Moving Depth Camera. Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (UIST), Santa Barbara, CA, USA.","DOI":"10.1145\/2047196.2047270"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Newcombe, R.A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A.J., Kohi, P., Shotton, J., Hodges, S., and Fitzgibbon, A. (2011, January 26\u201329). KinectFusion: Real-Time Dense Surface Mapping and Tracking. Proceedings of the 10th IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Basel, Switzerland.","DOI":"10.1109\/ISMAR.2011.6092378"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Valentin, J., Dai, A., Nie\u00dfner, M., Kohli, P., Torr, P., Izadi, S., and Keskin, C. (2016). Learning to Navigate the Energy Landscape. arXiv.","DOI":"10.1109\/3DV.2016.41"},{"key":"ref_22","first-page":"1","article-title":"Real-Time 3D Reconstruction at Scale Using Voxel Hashing","volume":"32","author":"Izadi","year":"2013","journal-title":"ACM Trans. Graph. (TOG)"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1177\/0278364916679498","article-title":"1 Year, 1000 km: The Oxford RobotCar Dataset","volume":"36","author":"Maddern","year":"2017","journal-title":"Int. J. Robot. Res. (IJRR)"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Hu, M., Wang, S., Li, B., Ning, S., Fan, L., and Gong, X. (2021). PENet: Towards Precise and Efficient Image Guided Depth Completion. arXiv.","DOI":"10.1109\/ICRA48506.2021.9561035"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Park, J., Joo, K., Hu, Z., Liu, C.K., and So Kweon, I. (2020). Non-Local Spatial Propagation Network for Depth Completion. Proceedings of the European Conference on Computer Vision (ECCV) 2020, Springer.","DOI":"10.1007\/978-3-030-58601-0_8"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Van Gansbeke, W., Neven, D., De Brabandere, B., and Van Gool, L. (2019, January 27\u201331). Sparse and Noisy LIDAR Completion with RGB Guidance and Uncertainty. Proceedings of the 16th International Conference on Machine Vision Applications (MVA), Tokyo, Japan.","DOI":"10.23919\/MVA.2019.8757939"},{"key":"ref_27","unstructured":"Qi, C.R., Su, H., Mo, K., and Guibas, L.J. (2017, January 21\u201326). PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA."},{"key":"ref_28","unstructured":"Qi, C.R., Yi, L., Su, H., and Guibas, L.J. (2017). PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. arXiv."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/18\/6971\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:32:03Z","timestamp":1760142723000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/18\/6971"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,15]]},"references-count":28,"journal-issue":{"issue":"18","published-online":{"date-parts":[[2022,9]]}},"alternative-id":["s22186971"],"URL":"https:\/\/doi.org\/10.3390\/s22186971","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2022,9,15]]}}}