{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T04:18:36Z","timestamp":1760242716087,"version":"build-2065373602"},"reference-count":52,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2016,4,7]],"date-time":"2016-04-07T00:00:00Z","timestamp":1459987200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Mobile robots are of great help for automatic monitoring tasks in different environments. One of the first tasks that needs to be addressed when creating these kinds of robotic systems is modeling the robot environment. This work proposes a pipeline to build an enhanced visual model of a robot environment indoors. Vision based recognition approaches frequently use quantized feature spaces, commonly known as Bag of Words (BoW) or vocabulary representations. A drawback using standard BoW approaches is that semantic information is not considered as a criteria to create the visual words. To solve this challenging task, this paper studies how to leverage the standard vocabulary construction process to obtain a more meaningful visual vocabulary of the robot work environment using image sequences. We take advantage of spatio-temporal constraints and prior knowledge about the position of the camera. The key contribution of our work is the definition of a new pipeline to create a model of the environment. This pipeline incorporates (1) tracking information to the process of vocabulary construction and (2) geometric cues to the appearance descriptors. Motivated by long term robotic applications, such as the aforementioned monitoring tasks, we focus on a configuration where the robot camera points to the ceiling, which captures more stable regions of the environment. The experimental validation shows how our vocabulary models the environment in more detail than standard vocabulary approaches, without loss of recognition performance. We show different robotic tasks that could benefit of the use of our visual vocabulary approach, such as place recognition or object discovery. For this validation, we use our publicly available data-set.<\/jats:p>","DOI":"10.3390\/s16040493","type":"journal-article","created":{"date-parts":[[2016,4,7]],"date-time":"2016-04-07T11:52:48Z","timestamp":1460029968000},"page":"493","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Building an Enhanced Vocabulary of the Robot Environment with a Ceiling Pointing Camera"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6738-3382","authenticated-orcid":false,"given":"Alejandro","family":"Rituerto","sequence":"first","affiliation":[{"name":"Instituto de Investigaci\u00f3n en Ingenier\u00eda de Arag\u00f3n, Deptartmento de Inform\u00e1tica e Ingenier\u00eda de Sistemas, University of Zaragoza, Zaragoza 50018, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Henrik","family":"Andreasson","sequence":"additional","affiliation":[{"name":"Centre for Applied Autonomous Sensor Systems, Deptartment of Technology, \u00d6rebro University, \u00d6rebro SE-70182, Sweden"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ana","family":"Murillo","sequence":"additional","affiliation":[{"name":"Instituto de Investigaci\u00f3n en Ingenier\u00eda de Arag\u00f3n, Deptartmento de Inform\u00e1tica e Ingenier\u00eda de Sistemas, University of Zaragoza, Zaragoza 50018, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0217-9326","authenticated-orcid":false,"given":"Achim","family":"Lilienthal","sequence":"additional","affiliation":[{"name":"Centre for Applied Autonomous Sensor Systems, Deptartment of Technology, \u00d6rebro University, \u00d6rebro SE-70182, Sweden"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jos\u00e9","family":"Guerrero","sequence":"additional","affiliation":[{"name":"Instituto de Investigaci\u00f3n en Ingenier\u00eda de Arag\u00f3n, Deptartmento de Inform\u00e1tica e Ingenier\u00eda de Sistemas, University of Zaragoza, Zaragoza 50018, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2016,4,7]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Sivic, J., and Zisserman, A. (2003, January 13\u201316). Video Google: A text retrieval approach to object matching in videos. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Nice, France.","DOI":"10.1109\/ICCV.2003.1238663"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Philbin, J., Chum, O., Isard, M., Sivic, J., and Zisserman, A. (2007, January 18\u201323). Object retrieval with large vocabularies and fast spatial matching. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis, MN, USA.","DOI":"10.1109\/CVPR.2007.383172"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"316","DOI":"10.1007\/s11263-009-0285-2","article-title":"Improving bag-of-features for large scale image search","volume":"87","author":"Douze","year":"2010","journal-title":"Int. J. Comput. Vis."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"284","DOI":"10.1007\/s11263-009-0271-8","article-title":"Unsupervised object discovery: A comparison","volume":"88","author":"Tuytelaars","year":"2010","journal-title":"Int. J. Comput. Vis."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1100","DOI":"10.1177\/0278364910385483","article-title":"Appearance-only SLAM at large scale with FAB-MAP 2.0","volume":"30","author":"Cummins","year":"2011","journal-title":"Int. J. Robot. Res."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1460","DOI":"10.1016\/j.robot.2013.07.008","article-title":"Learning spatially semantic representations for cognitive robot navigation","volume":"61","author":"Kostavelis","year":"2013","journal-title":"Robot. Auton. Syst."},{"key":"ref_7","first-page":"249","article-title":"Real-world issues in warehouse navigation","volume":"2352","author":"Everett","year":"1995","journal-title":"Photonics Ind. Appl. Int. Soc. Opt. Photonics"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1016\/S0004-3702(99)00070-3","article-title":"Experiences with an interactive museum tour-guide robot","volume":"114","author":"Burgard","year":"1999","journal-title":"Artif. Intell."},{"key":"ref_9","unstructured":"Fukuda, T., Yokoyama, Y., Arai, F., Shimojima, K., Ito, S., Abe, Y., Tanaka, K., and Tanaka, Y. (1996, January 22\u201328). Navigation system based on ceiling landmark recognition for autonomous mobile robot-position\/orientation control by landmark recognition with plus and minus primitives. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Minneapolis, MN, USA."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Wulf, O., Lecking, D., and Wagner, B. (2006, January 9\u201315). Robust self-localization in industrial environments based on 3D ceiling structures. Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Beijing, China.","DOI":"10.1109\/IROS.2006.281984"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Konolige, K., and Bowman, J. (2009, January 9\u201315). Towards lifelong visual maps. Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), St. Louis, MO, USA.","DOI":"10.1109\/IROS.2009.5354121"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"149","DOI":"10.1016\/j.robot.2009.09.010","article-title":"SIFT, SURF & seasons: Appearance-based long-term localization in outdoor environments","volume":"58","author":"Valgren","year":"2010","journal-title":"Robot. Auton. Syst."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"149","DOI":"10.1016\/S0921-8890(02)00356-1","article-title":"A meta-learning approach to ground symbols from visual percepts","volume":"43","author":"Bredeche","year":"2003","journal-title":"Robot. Auton. Syst."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1007\/s00138-009-0217-8","article-title":"Multimedia translation for linking visual data to semantics in videos","volume":"22","author":"Duygulu","year":"2011","journal-title":"Mach. Vis. Appl."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"685","DOI":"10.1016\/j.robot.2012.10.002","article-title":"Semantic labeling for indoor topological mapping using a wearable catadioptric system","volume":"62","author":"Rituerto","year":"2014","journal-title":"Robot. Auton. Syst."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"6734","DOI":"10.3390\/s140406734","article-title":"Object Detection Techniques Applied on Mobile Robot Semantic Navigation","volume":"14","author":"Astua","year":"2014","journal-title":"Sensors"},{"key":"ref_17","unstructured":"Arandjelovi\u0107, R., and Zisserman, A. (2015). Asian Conference on Computer Vision (ACCV), Springer."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1617","DOI":"10.1109\/TIE.2009.2012457","article-title":"Ceiling-based visual positioning for an indoor mobile robot with monocular vision","volume":"56","author":"Xu","year":"2009","journal-title":"IEEE Trans. Ind. Electron."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"4804","DOI":"10.1109\/TIE.2011.2109333","article-title":"Monocular vision-based SLAM in indoor environment using corner, lamp, and door features from upward-looking camera","volume":"58","author":"Hwang","year":"2011","journal-title":"IEEE Trans. Ind. Electron."},{"key":"ref_20","unstructured":"Vieira, M., Faria, D.R., and Nunes, U. (2016). Robot 2015: Second Iberian Robotics Conference, Springer International Publishing."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Kani, S., and Miura, J. (2015, January 11\u201313). Mobile monitoring of physical states of indoor environments for personal support. Proceedings of the IEEE\/SICE International Symposium on System Integration (SII), Nagoya, Japan.","DOI":"10.1109\/SII.2015.7404951"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Mantha, B.R., Feng, C., Menassa, C.C., and Kamat, V.R. (2015, January 15\u201318). Real-time building energy and comfort parameter data collection using mobile indoor robots. Proceedings of the International Symposium on Automation and Robotics in Construction (ISARC), Oulu, Finland.","DOI":"10.22260\/ISARC2015\/0086"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.robot.2014.11.009","article-title":"Vision-based topological mapping and localization methods: A survey","volume":"64","author":"Ortiz","year":"2015","journal-title":"Robot. Auton. Syst."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1007\/s00138-013-0527-8","article-title":"Evaluating multimedia features and fusion for example-based event detection","volume":"25","author":"Myers","year":"2014","journal-title":"Mach. Vis. Appl."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Irschara, A., Zach, C., Frahm, J.M., and Bischof, H. (2009, January 20\u201325). From structure-from-motion point clouds to fast location recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA.","DOI":"10.1109\/CVPRW.2009.5206587"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1007\/s11263-012-0600-1","article-title":"Learning vocabularies over a fine quantization","volume":"103","author":"Mikulik","year":"2013","journal-title":"Int. J. Comput. Vis."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Philbin, J., Chum, O., Isard, M., Sivic, J., and Zisserman, A. (2008, January 23\u201328). Lost in quantization: Improving particular object retrieval in large scale image databases. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, AK, USA.","DOI":"10.1109\/CVPR.2008.4587635"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Boiman, O., Shechtman, E., and Irani, M. (2008, January 23\u201328). In defense of nearest-neighbor based image classification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, AK, USA.","DOI":"10.1109\/CVPR.2008.4587598"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Yang, L., Jin, R., Sukthankar, R., and Jurie, F. (2008, January 23\u201328). Unifying discriminative visual codebook generation with classifier training for object category recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, AK, USA.","DOI":"10.1109\/CVPR.2008.4587504"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"3249","DOI":"10.1016\/j.patcog.2013.05.001","article-title":"Joint learning and weighting of visual vocabulary for bag-of-feature based tissue classification","volume":"46","author":"Wang","year":"2013","journal-title":"Pattern Recognit."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"959","DOI":"10.1007\/s00138-012-0473-x","article-title":"Informative patches sampling for image classification by utilizing bottom-up and top-down information","volume":"24","author":"Bai","year":"2013","journal-title":"Mach. Vis. Appl."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"897","DOI":"10.1016\/j.patcog.2011.07.021","article-title":"Supervised learning of Gaussian mixture models for visual vocabulary generation","volume":"45","author":"Fernando","year":"2012","journal-title":"Pattern Recognit."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Cao, Y., Wang, C., Li, Z., Zhang, L., and Zhang, L. (2010, January 13\u201318). Spatial-bag-of-features. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA.","DOI":"10.1109\/CVPR.2010.5540021"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Ji, R., Yao, H., Sun, X., Zhong, B., and Gao, W. (2010, January 13\u201318). Towards semantic embedding in visual vocabulary. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA.","DOI":"10.1109\/CVPR.2010.5540118"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Chum, O., Perdoch, M., and Matas, J. (2009, January 20\u201325). Geometric min-hashing: Finding a (thick) needle in a haystack. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA.","DOI":"10.1109\/CVPRW.2009.5206531"},{"key":"ref_36","unstructured":"Yang, Y., and Newsam, S. (2011, January 6\u201313). Spatial pyramid co-occurrence for image classification. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"1039","DOI":"10.1016\/j.patcog.2012.07.024","article-title":"Bag of spatio-visual words for context inference in scene classification","volume":"46","author":"Bolovinou","year":"2013","journal-title":"Pattern Recognit."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"705","DOI":"10.1016\/j.patcog.2013.08.012","article-title":"Visual word spatial arrangement for image retrieval and classification","volume":"47","author":"Penatti","year":"2014","journal-title":"Pattern Recognit."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Jegou, H., Douze, M., and Schmid, C. (2008, January 12\u201318). Hamming embedding and weak geometric consistency for large scale image search. Proceedings of the European Conference on Computer Vision (ECCV), Marseille, France.","DOI":"10.1007\/978-3-540-88682-2_24"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"101","DOI":"10.1145\/2185520.2185597","article-title":"What makes Paris look like Paris?","volume":"31","author":"Doersch","year":"2012","journal-title":"ACM Trans. Graph."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Berg, T.L., and Berg, A.C. (2009, January 20\u201325). Finding iconic images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5204174"},{"key":"ref_42","unstructured":"Fergus, R., Perona, P., and Zisserman, A. (2003, January 18\u201320). Object class recognition by unsupervised scale-invariant learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Madison, WI, USA."},{"key":"ref_43","unstructured":"Russell, B.C., Freeman, W.T., Efros, A.A., Sivic, J., and Zisserman, A. (2006, January 17\u201322). Using multiple segmentations to discover objects and their extent in image collections. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New York, NY, USA."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Singh, S., Gupta, A., and Efros, A.A. (2012, January 7\u201313). Unsupervised discovery of mid-level discriminative patches. Proceedings of the European Conference on Computer Vision (ECCV), Florence, Italy.","DOI":"10.1007\/978-3-642-33709-3_6"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Liu, J., Yang, Y., and Shah, M. (2009, January 20\u201325). Learning semantic visual vocabularies using diffusion distance. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206845"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"346","DOI":"10.1016\/j.cviu.2007.09.014","article-title":"Speeded-Up Robust Features (SURF)","volume":"110","author":"Bay","year":"2008","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"236","DOI":"10.1080\/01621459.1963.10500845","article-title":"Hierarchical grouping to optimize an objective function","volume":"58","author":"Ward","year":"1963","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_48","first-page":"1409","article-title":"A statistical method for evaluating systematic relationships","volume":"6","author":"Sokal","year":"1958","journal-title":"Univ. Kansas Sci. Bull."},{"key":"ref_49","unstructured":"Ester, M., Kriegel, H.P., Sander, J., and Xu, X. (1996, January 2\u20134). A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD), Portland, OR, USA."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Achtert, E., Kriegel, H.P., Schubert, E., and Zimek, A. (2013, January 22\u201327). Interactive data mining with 3D-parallel-coordinate-trees. Proceedings of the ACM Conference on Special Interest Group on Management of Data (SIGMOD), New York, NY, USA.","DOI":"10.1145\/2463676.2463696"},{"key":"ref_51","unstructured":"Rituerto, A., Andreasson, H., Murillo, A.C., Lilienthal, A., and Guerrero, J.J. Hierarchical Vocabulary\u2014Evaluation Data. Available online: http:\/\/aass.oru.se\/Research\/Learning\/datasets.html."},{"key":"ref_52","unstructured":"Kuemmerle, R., Grisetti, G., Strasdat, H., Konolige, K., and Burgard, W. g2o: A General Framework for Graph Optimization. Available online: http:\/\/openslam.org\/g2o.html."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/16\/4\/493\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T19:21:53Z","timestamp":1760210513000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/16\/4\/493"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,4,7]]},"references-count":52,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2016,4]]}},"alternative-id":["s16040493"],"URL":"https:\/\/doi.org\/10.3390\/s16040493","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2016,4,7]]}}}