{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,17]],"date-time":"2026-01-17T22:54:34Z","timestamp":1768690474007,"version":"3.49.0"},"reference-count":33,"publisher":"MDPI AG","issue":"23","license":[{"start":{"date-parts":[[2020,12,4]],"date-time":"2020-12-04T00:00:00Z","timestamp":1607040000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"the National Key Research and Development Program of China","award":["No. 2019YFB1310004"],"award-info":[{"award-number":["No. 2019YFB1310004"]}]},{"name":"the Key Research and Development Program of Guangdong Province","award":["No. 2020B090928002"],"award-info":[{"award-number":["No. 2020B090928002"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>The traditional CNN for 6D robot relocalization which outputs pose estimations does not interpret whether the model is making sensible predictions or just guessing at random. We found that convnet representations trained on classification problems generalize well to other tasks. Thus, we propose a multi-task CNN for robot relocalization, which can simultaneously perform pose regression and scene recognition. Scene recognition determines whether the input image belongs to the current scene in which the robot is located, not only reducing the error of relocalization but also making us understand with what confidence we can trust the prediction. Meanwhile, we found that when there is a large visual difference between testing images and training images, the pose precision becomes low. Based on this, we present the dual-level image-similarity strategy (DLISS), which consists of two levels: initial level and iteration-level. The initial level performs feature vector clustering in the training set and feature vector acquisition in testing images. The iteration level, namely, the PSO-based image-block selection algorithm, can select the testing images which are the most similar to training images based on the initial level, enabling us to gain higher pose accuracy in testing set. Our method considers both the accuracy and the robustness of relocalization, and it can operate indoors and outdoors in real time, taking at most 27 ms per frame to compute. Finally, we used the Microsoft 7Scenes dataset and the Cambridge Landmarks dataset to evaluate our method. It can obtain approximately 0.33 m and 7.51\u2218 accuracy on 7Scenes dataset, and get approximately 1.44 m and 4.83\u2218 accuracy on the Cambridge Landmarks dataset. Compared with PoseNet, our CNN reduced the average positional error by 25% and the average angular error by 27.79% on 7Scenes dataset, and reduced the average positional error by 40% and the average angular error by 28.55% on the Cambridge Landmarks dataset. We show that our multi-task CNN can localize from high-level features and is robust to images which are not in the current scene. Furthermore, we show that our multi-task CNN gets higher accuracy of relocalization by using testing images obtained by DLISS.<\/jats:p>","DOI":"10.3390\/s20236943","type":"journal-article","created":{"date-parts":[[2020,12,4]],"date-time":"2020-12-04T11:59:00Z","timestamp":1607083140000},"page":"6943","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Visual Robot Relocalization Based on Multi-Task CNN and Image-Similarity Strategy"],"prefix":"10.3390","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7223-9553","authenticated-orcid":false,"given":"Tao","family":"Xie","sequence":"first","affiliation":[{"name":"State Key Laboratory of Robotics and System, Harbin Institute of Technology, 92 Xidazhi Street, Harbin 150006, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5615-0847","authenticated-orcid":false,"given":"Ke","family":"Wang","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Robotics and System, Harbin Institute of Technology, 92 Xidazhi Street, Harbin 150006, China"}]},{"given":"Ruifeng","family":"Li","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Robotics and System, Harbin Institute of Technology, 92 Xidazhi Street, Harbin 150006, China"}]},{"given":"Xinyue","family":"Tang","sequence":"additional","affiliation":[{"name":"MFIN, Faculty of Business and Economics, The University of Hong Kong, Pokfulam Road, Hong Kong 999077, China"}]}],"member":"1968","published-online":{"date-parts":[[2020,12,4]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"580","DOI":"10.1109\/LRA.2020.2964157","article-title":"Relocalization with submaps: Multi-session mapping for planetary rovers equipped with stereo cameras","volume":"5","author":"Giubilato","year":"2020","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"101552","DOI":"10.1016\/j.jengtecman.2019.11.003","article-title":"A theory of the evolution of technology: Technological parasitism and the implications for innovation magement","volume":"55","author":"Coccia","year":"2020","journal-title":"J. Eng. Technol. Manag."},{"key":"ref_3","first-page":"1245","article-title":"Image-similarity-based Convolutional Neural Network for Robot Visual Relocalization","volume":"32","author":"Wang","year":"2020","journal-title":"Sens. Mater."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Schonberger, J., and Frahm, J.M. (2016, January 27\u201330). Structure-from-Motion Revisited. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.445"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Gee, A.P., and Mayol-Cuevas, W. (2012). 6D relocalisation for RGBD cameras using synthetic view regression. Cuevas, 1\u201311.","DOI":"10.5244\/C.26.113"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Glocker, B., Izadi, S., Shotton, J., and Criminisi, A. (2013, January 1\u20134). Real-time RGB-D camera relocalization. Proceedings of the 2013 IEEE International Symposium on Mixed and Augmented Reality, Adelaide, Australia.","DOI":"10.1109\/ISMAR.2013.6671777"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1699","DOI":"10.1109\/TPAMI.2011.41","article-title":"Automatic relocalization and loop closing for real- time monocular SLAM","volume":"33","author":"Williams","year":"2011","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Williams, B., Klein, G., and Reid, I. (2007, January 14\u201321). Real-time SLAM relocalisation. Proceedings of the 2007 IEEE 11th International Conference on Computer Vision, Rio de Janeiro, Brazil.","DOI":"10.1109\/ICCV.2007.4409115"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Valentin, J., Niener, M., Shotton, J., Fitzgibbon, A., Izadi, S., and Torr, P. (2015, January 7\u201312). Exploiting uncertainty in regression forests for accurate camera relocalization. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7299069"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1109\/TMI.2013.2282997","article-title":"Visual SLAM for handheld monocular endoscope","volume":"33","author":"Grasa","year":"2014","journal-title":"IEEE Trans. Med. Imaging"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Rublee, E., Rabaud, V., Konolige, K., and Bradski, G. (2011, January 6\u201313). ORB: An efficient alternative to SIFT or SURF. Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain.","DOI":"10.1109\/ICCV.2011.6126544"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Lowe, D.G. (1999, January 20\u201327). Object recognition from local scale-invariant features. Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece.","DOI":"10.1109\/ICCV.1999.790410"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1023\/B:VISI.0000029664.99615.94","article-title":"Distinctive image features from scale-invariant keypoints","volume":"60","author":"Lowe","year":"2004","journal-title":"Int. J. Comput. Vis."},{"key":"ref_14","unstructured":"Hao, Q., Cai, R., Li, Z., Zhang, L., Pang, Y., and Wu, F. (2012, January 16\u201321). 3D visual phrases for landmark recognition. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Li, Y., Snavely, N., and Huttenlocher, D.P. (2010, January 5\u201311). Location recognition using prioritized feature matching. Proceedings of the European Conference on Computer Vision, Crete, Greece.","DOI":"10.1007\/978-3-642-15552-9_57"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., and Fitzgibbon, A. (2013, January 23\u201328). Scene coordinate regression forests for camera relocalization in rgb-d images. Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.377"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"4431","DOI":"10.1109\/LRA.2020.3000429","article-title":"Regression Forest Based RGB-D Visual Relocalization Using Coarse-to-Fine Strategy","volume":"5","author":"Wang","year":"2020","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Brachmann, E., Krull, A., Nowozin, S., Shotton, J., Michel, F., Gumhold, S., and Rother, C. (2017, January 21\u201326). Dsac-differentiable ransac for camera localization. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.267"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Brachmann, E., and Rother, C. (2018, January 18\u201323). Learning less is more-6d camera localization via 3d surface regression. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00489"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Xu, S., Chou, W., and Dong, H. (2019). A robust indoor localization system integrating visual localization aided by CNN-based image retrieval with Monte Carlo localization. Sensors, 19.","DOI":"10.3390\/s19020249"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Kendall, A., Grimes, M., and Cipolla, R. (2015, January 7\u201313). PoseNet: A convolutional network for real-time 6-dof camera reloc-alization. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.336"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanho, V., and Rabinovich, A. (2015, January 7\u201312). Going deeper with convolutions. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_23","first-page":"4762","article-title":"Modelling uncertainty in deep learning for camera relocalization","volume":"31","author":"Kendall","year":"2015","journal-title":"Des. Eng. Anal. Reliab. Effic. Softw."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"120","DOI":"10.1016\/j.imavis.2019.06.014","article-title":"DeepDSAIR: Deep 6-DOF camera relocalization using deblurred semantic-aware image representation for large-scale outdoor environments","volume":"89","author":"Esfahani","year":"2019","journal-title":"Image Vis. Comput."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Melekhov, I., Ylioinas, J., Kannala, J., and Rahtu, E. (2017, January 22\u201329). Image-based localization using hourglass networks. Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), Venice, Italy.","DOI":"10.1109\/ICCVW.2017.107"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Wu, J., Ma, L., and Hu, X. (June, January 29). Delving deeper into convolutional neural networks for camera relocalization. Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore.","DOI":"10.1109\/ICRA.2017.7989663"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Phan, T.V., and Nakagawa, M. (2014, January 1\u20134). Text\/Non-text classification in online handwritten documents with recurrent neural networks. Proceedings of the 2014 14th International Conference on Frontiers in Handwriting Recognition, Heraklion, Greece.","DOI":"10.1109\/ICFHR.2014.12"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Xu, P., and Sarikayam, R. (2014, January 4\u20139). Contextual domain classification in spoken language understanding systems using recurrent neural network. Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy.","DOI":"10.1109\/ICASSP.2014.6853573"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Nguyen, A., Do, T.-T., Caldwell, D.G., and Tsagarakis, N.G. (2019, January 16\u201317). Real-time 6DOF pose relocalization for event cameras with stacked spatial LSTM networks. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA.","DOI":"10.1109\/CVPRW.2019.00207"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Clark, R., Wang, S., Markham, A., Trigoni, N., and Wen, H. (2017, January 21\u201326). VidLoc: A deep spatio-temporal model for 6-dof video-clip relocalization. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.284"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref_32","unstructured":"Kennedy, J., and Eberhart, R. (December, January 27). Particle swarm optimization. Proceedings of the ICNN\u201995-International Conference on Neural Networks, Perth, Australia."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Parsopoulos, K.E., and Vrahatis, M.N. (2002, January 10\u201314). Particle swarm optimization method in multiobjective problems. Proceedings of the 2002 ACM Symposium on Applied Computing, Madrid, Spain.","DOI":"10.1145\/508791.508907"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/23\/6943\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T10:41:42Z","timestamp":1760179302000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/23\/6943"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,12,4]]},"references-count":33,"journal-issue":{"issue":"23","published-online":{"date-parts":[[2020,12]]}},"alternative-id":["s20236943"],"URL":"https:\/\/doi.org\/10.3390\/s20236943","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,12,4]]}}}