{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,19]],"date-time":"2026-02-19T16:42:23Z","timestamp":1771519343969,"version":"3.50.1"},"reference-count":75,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2022,9,6]],"date-time":"2022-09-06T00:00:00Z","timestamp":1662422400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Natural Sciences and Engineering Research Council (NSERC) of Canada","award":["RGPIN-2017-06261"],"award-info":[{"award-number":["RGPIN-2017-06261"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Robotics"],"abstract":"<jats:p>The technological advances in computational systems have enabled very complex computer vision and machine learning approaches to perform efficiently and accurately. These new approaches can be considered a new set of tools to reshape the visual SLAM solutions. We present an investigation of the latest neuroscientific research that explains how the human brain can accurately navigate and map unknown environments. The accuracy suggests that human navigation is not affected by traditional visual odometry drifts resulting from tracking visual features. It utilises the geometrical structures of the surrounding objects within the navigated space. The identified objects and space geometrical shapes anchor the estimated space representation and mitigate the overall drift. Inspired by the human brain\u2019s navigation techniques, this paper presents our efforts to incorporate two machine learning techniques into a VSLAM solution: semantic segmentation and layout estimation to imitate human abilities to map new environments. The proposed system benefits from the geometrical relations between the corner points of the cuboid environments to improve the accuracy of trajectory estimation. Moreover, the implemented SLAM solution semantically groups the map points and then tracks each group independently to limit the system drift. The implemented solution yielded higher trajectory accuracy and immunity to large pure rotations.<\/jats:p>","DOI":"10.3390\/robotics11050091","type":"journal-article","created":{"date-parts":[[2022,9,8]],"date-time":"2022-09-08T04:18:32Z","timestamp":1662610712000},"page":"91","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Improved Visual SLAM Using Semantic Segmentation and Layout Estimation"],"prefix":"10.3390","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3641-4005","authenticated-orcid":false,"given":"Ahmed","family":"Mahmoud","sequence":"first","affiliation":[{"name":"Department of Systems and Computer Engineering, Carleton University, Ottawa, ON K1S 5B6, Canada"}]},{"given":"Mohamed","family":"Atia","sequence":"additional","affiliation":[{"name":"Department of Systems and Computer Engineering, Carleton University, Ottawa, ON K1S 5B6, Canada"}]}],"member":"1968","published-online":{"date-parts":[[2022,9,6]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"297","DOI":"10.3389\/fnhum.2018.00297","article-title":"Spatial Representations in the Human Brain","volume":"12","author":"Herweg","year":"2018","journal-title":"Front. Hum. Neurosci."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"731","DOI":"10.1002\/hipo.22449","article-title":"Why vision is important to how we navigate","volume":"25","author":"Ekstrom","year":"2015","journal-title":"Hippocampus"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1016\/j.neunet.2013.01.016","article-title":"Cognitive memory","volume":"41","author":"Widrow","year":"2013","journal-title":"Neural Netw."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1016\/j.cobeha.2017.06.005","article-title":"Human spatial navigation: Representations across dimensions and scales","volume":"17","author":"Ekstrom","year":"2017","journal-title":"Curr. Opin. Behav. Sci."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1065","DOI":"10.1016\/S0893-6080(05)80159-5","article-title":"A model of hippocampal function","volume":"7","author":"Burgess","year":"1994","journal-title":"Neural Netw."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1258","DOI":"10.1126\/science.1099901","article-title":"Spatial Representation in the Entorhinal Cortex","volume":"305","author":"Fyhn","year":"2004","journal-title":"Science"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"758","DOI":"10.1126\/science.1125572","article-title":"Conjunctive Representation of Position, Direction, and Velocity in Entorhinal Cortex","volume":"312","author":"Sargolini","year":"2006","journal-title":"Science"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"681","DOI":"10.1038\/297681a0","article-title":"Place navigation impaired in rats with hippocampal lesions","volume":"297","author":"Morris","year":"1982","journal-title":"Nature"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1007\/s004220000172","article-title":"Predictions derived from modelling the hippocampal role in navigation","volume":"83","author":"Burgess","year":"2000","journal-title":"Biol. Cybern."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"171","DOI":"10.1016\/S0959-4388(99)80023-3","article-title":"Human spatial navigation: Cognitive maps, sexual dimorphism, and neural substrates","volume":"9","author":"Maguire","year":"1999","journal-title":"Curr. Opin. Neurobiol."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1016\/j.cogpsych.2005.08.003","article-title":"Spatial knowledge acquisition from direct experience in the environment: Individual differences in the development of metric knowledge and the integration of separately learned places","volume":"52","author":"Ishikawa","year":"2006","journal-title":"Cognit. Psychol."},{"key":"ref_12","first-page":"646","article-title":"Reference frames in virtual spatial navigation are viewpoint dependent","volume":"8","author":"Buchanan","year":"2014","journal-title":"Front. Hum. Neurosci."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1199","DOI":"10.1037\/0096-1523.31.6.1199","article-title":"Evidence of Separable Spatial Representations in a Virtual Navigation Task","volume":"31","author":"Gramann","year":"2005","journal-title":"J. Exp. Psychol. Hum. Percept. Perform."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"143","DOI":"10.1016\/j.cognition.2012.05.006","article-title":"Retrieving enduring spatial representations after disorientation","volume":"124","author":"Li","year":"2012","journal-title":"Cognition"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1016\/S0065-2407(08)60007-5","article-title":"The Development of Spatial Representations of Large-Scale Environments","volume":"Volume 10","author":"Siegel","year":"1975","journal-title":"Advances in Child Development and Behavior"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1037\/h0061626","article-title":"Cognitive maps in rats and men","volume":"55","author":"Tolman","year":"1948","journal-title":"Psychol. Rev."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"190","DOI":"10.1162\/jocn.1991.3.2.190","article-title":"\u201cDead Reckoning,\u201d Landmark Learning, and the Sense of Direction: A Neurophysiological and Computational Hypothesis","volume":"3","author":"McNaughton","year":"1991","journal-title":"J. Cogn. Neurosci."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"452","DOI":"10.1016\/S0028-3932(00)00140-8","article-title":"Path integration following temporal lobectomy in humans","volume":"39","author":"Worsley","year":"2001","journal-title":"Neuropsychologia"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"100","DOI":"10.1177\/001391657000200106","article-title":"Styles and methods of structuring a city","volume":"2","author":"Appleyard","year":"1970","journal-title":"Environ. Behav."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"91","DOI":"10.2307\/427643","article-title":"The Image of the City","volume":"21","author":"Chapman","year":"1962","journal-title":"J. Aesthet. Art Crit."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1106","DOI":"10.3758\/s13421-014-0418-x","article-title":"Different \u201croutes\u201d to a cognitive map: Dissociable forms of spatial knowledge derived from route and cartographic map learning","volume":"42","author":"Zhang","year":"2014","journal-title":"Mem. Cognit."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"803","DOI":"10.3389\/fnhum.2014.00803","article-title":"A critical review of the allocentric spatial representation and its neural underpinnings: Toward a network-based perspective","volume":"8","author":"Ekstrom","year":"2014","journal-title":"Front. Hum. Neurosci."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"560","DOI":"10.1016\/0010-0285(82)90019-6","article-title":"Differences in spatial knowledge acquired from maps and navigation","volume":"14","author":"Thorndyke","year":"1982","journal-title":"Cognit. Psychol."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1157","DOI":"10.1037\/0278-7393.15.6.1157","article-title":"Access to Knowledge of Spatial Structure at Novel Points of Observation","volume":"15","author":"Rieser","year":"1989","journal-title":"J. Exp. Psychol. Learn. Mem. Cogn."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"274","DOI":"10.1006\/cogp.2001.0758","article-title":"Systems of Spatial Reference in Human Memory","volume":"43","author":"Shelton","year":"2001","journal-title":"Cognit. Psychol."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"867","DOI":"10.1037\/0278-7393.32.4.867","article-title":"Transient and enduring spatial representations under disorientation and self-rotation","volume":"32","author":"Waller","year":"2006","journal-title":"J. Exp. Psychol. Learn. Mem. Cogn."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"487","DOI":"10.1017\/S0140525X00063949","article-title":"Pr\u00e9cis of O\u2019Keefe & Nadel\u2019s The hippocampus as a cognitive map","volume":"2","author":"Nadel","year":"1979","journal-title":"Behav. Brain Sci."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Klatzky, R.L. (1998). Allocentric and Egocentric Spatial Representations: Definitions, Distinctions, and Interconnections, Springer.","DOI":"10.1007\/3-540-69342-4_1"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"1914","DOI":"10.1037\/a0032995","article-title":"Toward a definition of intrinsic axes: The effect of orthogonality and symmetry on the preferred direction of spatial memory","volume":"39","author":"Richard","year":"2013","journal-title":"J. Exp. Psychol. Learn. Mem. Cogn."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"589","DOI":"10.3758\/BF03196519","article-title":"Egocentric and geocentric frames of reference in memory of large-scale space","volume":"10","author":"McNamara","year":"2003","journal-title":"Psychon. Bull. Rev."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1037\/0278-7393.33.1.145","article-title":"Layout geometry in the selection of intrinsic frames of reference from multiple viewpoints","volume":"33","author":"Mou","year":"2007","journal-title":"J. Exp. Psychol. Learn. Mem. Cogn."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"565","DOI":"10.3389\/fpsyg.2013.00565","article-title":"Reference frames in allocentric representations are invariant across static and active encoding","volume":"4","author":"Chan","year":"2013","journal-title":"Front. Psychol."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"120","DOI":"10.1177\/0956797611429467","article-title":"Is the map in our head oriented north?","volume":"23","author":"Frankenstein","year":"2012","journal-title":"Psychol. Sci."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1016\/S0010-0277(00)00105-0","article-title":"Updating egocentric representations in human navigation","volume":"77","author":"Wang","year":"2000","journal-title":"Cognition"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"302","DOI":"10.1111\/j.1467-9280.1997.tb00442.x","article-title":"Viewpoint dependence in scene recognition","volume":"8","author":"Diwadkar","year":"1997","journal-title":"Psychol. Sci."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"851","DOI":"10.1037\/xlm0000346","article-title":"Multiple views of space: Continuous visual flow enhances small-scale spatial learning","volume":"43","author":"Holmes","year":"2017","journal-title":"J. Exp. Psychol. Learn. Mem. Cogn."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"566","DOI":"10.1007\/BF00450672","article-title":"Homing by path integration in a mammal","volume":"67","author":"Mittelstaedt","year":"1980","journal-title":"Naturwissenschaften"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"1538","DOI":"10.1016\/j.cub.2009.07.053","article-title":"Walking Straight into Circles","volume":"19","author":"Souman","year":"2009","journal-title":"Curr. Biol."},{"key":"ref_39","first-page":"365","article-title":"Allocentric Spatial Learning by Hippocampectomised Rats: A Further Test of the \u201cSpatial Mapping\u201d and \u201cWorking Memory\u201d Theories of Hippocampal Function","volume":"38","author":"Morris","year":"1986","journal-title":"Q. J. Exp. Psychol. Sect. B"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"910","DOI":"10.3758\/BF03193465","article-title":"Landmarks as beacons and associative cues: Their role in route learning","volume":"35","author":"Waller","year":"2007","journal-title":"Mem. Cognit."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"1465","DOI":"10.1523\/JNEUROSCI.09-05-01465.1989","article-title":"Differential effects of fornix and caudate nucleus lesions on two radial maze tasks: Evidence for multiple memory systems","volume":"9","author":"Packard","year":"1989","journal-title":"J. Neurosci."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"563","DOI":"10.1146\/annurev.neuro.25.112701.142937","article-title":"Learning and memory functions of the basal ganglia","volume":"25","author":"Packard","year":"2002","journal-title":"Annu. Rev. Neurosci."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1006\/nlme.2001.4008","article-title":"Multiple parallel memory systems in the brain of the rat","volume":"77","author":"White","year":"2002","journal-title":"Neurobiol. Learn. Mem."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Dasgupta, S., Fang, K., Chen, K., and Savarese, S. (2016, January 27\u201330). DeLay: Robust Spatial Layout Estimation for Cluttered Indoor Scenes. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.73"},{"key":"ref_45","first-page":"1068","article-title":"Review on Room Layout Estimation from a Single Image","volume":"9","author":"Mathew","year":"2020","journal-title":"Int. J. Eng. Res."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"1921","DOI":"10.1007\/s11042-021-11358-1","article-title":"Room layout estimation in indoor environment: A review","volume":"81","author":"Mohan","year":"2022","journal-title":"Multimed. Tools Appl."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Lee, C.-Y., Badrinarayanan, V., Malisiewicz, T., and Rabinovich, A. (2017). RoomNet: End-to-End Room Layout Estimation. arXiv.","DOI":"10.1109\/ICCV.2017.521"},{"key":"ref_48","unstructured":"Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T., and Xiao, J. (2015). LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop 2016. arXiv."},{"key":"ref_49","unstructured":"Coughlan, J.M., and Yuille, A.L. (2001, January 9\u201311). The manhattan world assumption: Regularities in scene statistics which enable Bayesian inference. Proceedings of the Advances in Neural Information Processing Systems, Denver, CO, USA."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"2481","DOI":"10.1109\/TPAMI.2016.2644615","article-title":"SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation","volume":"39","author":"Badrinarayanan","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Carreira, J., Agrawal, P., Fragkiadaki, K., and Malik, J. (2016, January 27\u201330). Human pose estimation with iterative error feedback. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.512"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Pfister, T., Charles, J., and Zisserman, A. (2015, January 7\u201313). Flowing ConvNets for Human Pose Estimation in Videos 2015. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.222"},{"key":"ref_53","unstructured":"Tompson, J., Jain, A., LeCun, Y., and Bregler, C. (2014, January 18\u201320). Joint training of a convolutional network and a graphical model for human pose estimation. Proceedings of the Advances in Neural Information Processing Systems, Bangkok, Thailand."},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Wu, J., Xue, T., Lim, J.J., Tian, Y., Tenenbaum, J.B., Torralba, A., and Freeman, W.T. (2016, January 8\u201316). Single image 3D interpreter network. Proceedings of the European Conference on Computer Vision (ECCV), Amesterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46466-4_22"},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"729","DOI":"10.1007\/s12555-018-0130-x","article-title":"Simultaneous Localization and Mapping in the Epoch of Semantics: A Survey","volume":"17","author":"Sualeh","year":"2019","journal-title":"Int. J. Control Autom. Syst."},{"key":"ref_56","unstructured":"Bowman, S.L., Atanasov, N., Daniilidis, K., and Pappas, G.J. (June, January 29). Probabilistic data association for semantic SLAM. Proceedings of the IEEE International Conference on Robotics and Automation, Singapore."},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Schonberger, J.L., Pollefeys, M., Geiger, A., and Sattler, T. (2018, January 18\u201323). Semantic Visual Localization. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00721"},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"1188","DOI":"10.1109\/TRO.2012.2197158","article-title":"Bags of Binary Words for Fast Place Recognition in Image Sequences","volume":"28","author":"Juan","year":"2012","journal-title":"IEEE Trans. Robot."},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Kaneko, M., Iwami, K., Ogawa, T., Yamasaki, T., and Aizawa, K. (2018, January 18\u201322). Mask-SLAM: Robust feature-based monocular SLAM by masking using semantic segmentation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPRW.2018.00063"},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","article-title":"DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs","volume":"40","author":"Chen","year":"2018","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Yu, C., Liu, Z., Liu, X.-J., Xie, F., Yang, Y., Wei, Q., and Fei, Q. (2018, January 1\u20135). DS-SLAM: A Semantic Visual SLAM towards Dynamic Environments. Proceedings of the 2018 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain.","DOI":"10.1109\/IROS.2018.8593691"},{"key":"ref_62","doi-asserted-by":"crossref","first-page":"316","DOI":"10.1016\/j.isprsjprs.2020.05.012","article-title":"SLAM integrated mobile mapping system in complex urban environments","volume":"166","author":"Li","year":"2020","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_63","doi-asserted-by":"crossref","unstructured":"Yuan, X., and Chen, S. (2020\u201324, January 24). SaD-SLAM: A Visual SLAM Based on Semantic and Depth Information. Proceedings of the 2020 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA.","DOI":"10.1109\/IROS45743.2020.9341180"},{"key":"ref_64","doi-asserted-by":"crossref","first-page":"386","DOI":"10.1109\/TPAMI.2018.2844175","article-title":"Mask R-CNN","volume":"42","author":"He","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"Qiu, Y., Wang, C., Wang, W., Henein, M., and Scherer, S. (2022, January 23\u201327). AirDOS: Dynamic SLAM benefits from Articulated Objects. Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA.","DOI":"10.1109\/ICRA46639.2022.9811667"},{"key":"ref_66","unstructured":"Casser, V., Pirk, S., Mahjourian, R., and Angelova, A. (February, January 27). Depth prediction without the sensors: Leveraging structure for unsupervised learning from monocular videos. Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA."},{"key":"ref_67","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/LRA.2018.2866205","article-title":"QuadricSLAM: Dual quadrics from object detections as landmarks in object-oriented SLAM","volume":"4","author":"Nicholson","year":"2019","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_68","doi-asserted-by":"crossref","unstructured":"Hosseinzadeh, M., Latif, Y., Pham, T., Suenderhauf, N., and Reid, I. (2018, January 2\u20136). Structure Aware SLAM Using Quadrics and Planes. Proceedings of the 14th Asian Conference on Computer Vision (ACCV), Perth, Australia.","DOI":"10.1007\/978-3-030-20893-6_26"},{"key":"ref_69","doi-asserted-by":"crossref","unstructured":"Runz, M., Buffier, M., and Agapito, L. (2018, January 16\u201320). MaskFusion: Real-Time Recognition, Tracking and Reconstruction of Multiple Moving Objects. Proceedings of the 2018 IEEE International Symposium on Mixed and Augmented Reality, ISMAR 2018, Munich, Germany.","DOI":"10.1109\/ISMAR.2018.00024"},{"key":"ref_70","doi-asserted-by":"crossref","unstructured":"McCormac, J., Clark, R., Bloesch, M., Davison, A., and Leutenegger, S. (2018, January 5\u20138). Fusion++: Volumetric object-level SLAM. Proceedings of the 2018 International Conference on 3D Vision, 3DV 2018, Verona, Italy.","DOI":"10.1109\/3DV.2018.00015"},{"key":"ref_71","doi-asserted-by":"crossref","unstructured":"Wang, Y., and Zell, A. (2018, January 12\u201314). Improving Feature-based Visual SLAM by Semantics. Proceedings of the IEEE 3rd International Conference on Image Processing, Applications and Systems, IPAS 2018, Sophia Antipolis, France.","DOI":"10.1109\/IPAS.2018.8708875"},{"key":"ref_72","doi-asserted-by":"crossref","first-page":"1255","DOI":"10.1109\/TRO.2017.2705103","article-title":"ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras","volume":"33","author":"Tardos","year":"2017","journal-title":"IEEE Trans. Robot."},{"key":"ref_73","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016;, January 27\u201330). You only look once: Unified, real-time object detection. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_74","doi-asserted-by":"crossref","unstructured":"Deitke, M., Han, W., Herrasti, A., Kembhavi, A., Kolve, E., Mottaghi, R., Salvador, J., Schwenk, D., VanderBilt, E., and Wallingford, M. (2020, January 13\u201319). RoboTHOR: An Open Simulation-to-Real Embodied AI Platform. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00323"},{"key":"ref_75","doi-asserted-by":"crossref","unstructured":"Ehsani, K., Han, W., Herrasti, A., VanderBilt, E., Weihs, L., Kolve, E., Kembhavi, A., and Mottaghi, R. (2021, January 20\u201325). ManipulaTHOR: A Framework for Visual Object Manipulation. Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00447"}],"container-title":["Robotics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2218-6581\/11\/5\/91\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:24:15Z","timestamp":1760142255000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2218-6581\/11\/5\/91"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,6]]},"references-count":75,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2022,10]]}},"alternative-id":["robotics11050091"],"URL":"https:\/\/doi.org\/10.3390\/robotics11050091","relation":{},"ISSN":["2218-6581"],"issn-type":[{"value":"2218-6581","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,9,6]]}}}