{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T14:08:50Z","timestamp":1773842930868,"version":"3.50.1"},"reference-count":209,"publisher":"SAGE Publications","issue":"10","license":[{"start":{"date-parts":[[2024,2,12]],"date-time":"2024-02-12T00:00:00Z","timestamp":1707696000000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"name":"ARL DCIST","award":["CRA W911NF-17-2-0181"],"award-info":[{"award-number":["CRA W911NF-17-2-0181"]}]},{"name":"ONR RAIDER","award":["N00014-18-1-2828"],"award-info":[{"award-number":["N00014-18-1-2828"]}]},{"name":"MIT Lincoln Laboratory\u2019s Autonomy al Fresco Program"},{"name":"Luca Carlone\u2019s Amazon Research Award"},{"name":"Lockheed Martin Corporation\u2019s Neural Prediction in 3D Dynamic Scene Graphs program"},{"name":"Artificial Intelligence Accelerator","award":["CRA FA8750-19-2-1000"],"award-info":[{"award-number":["CRA FA8750-19-2-1000"]}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of Robotics Research"],"published-print":{"date-parts":[[2024,9]]},"abstract":"<jats:p> 3D spatial perception is the problem of building and maintaining an actionable and persistent representation of the environment in real-time using sensor data and prior knowledge. Despite the fast-paced progress in robot perception, most existing methods either build purely geometric maps (as in traditional SLAM) or \u201cflat\u201d metric-semantic maps that do not scale to large environments or large dictionaries of semantic labels. The first part of this paper is concerned with representations: we show that scalable representations for spatial perception need to be hierarchical in nature. Hierarchical representations are efficient to store, and lead to layered graphs with small treewidth, which enable provably efficient inference. We then introduce an example of hierarchical representation for indoor environments, namely a 3D scene graph, and discuss its structure and properties. The second part of the paper focuses on algorithms to incrementally construct a 3D scene graph as the robot explores the environment. Our algorithms combine 3D geometry (e.g., to cluster the free space into a graph of places), topology (to cluster the places into rooms), and geometric deep learning (e.g., to classify the type of rooms the robot is moving across). The third part of the paper focuses on algorithms to maintain and correct 3D scene graphs during long-term operation. We propose hierarchical descriptors for loop closure detection and describe how to correct a scene graph in response to loop closures, by solving a 3D scene graph optimization problem. We conclude the paper by combining the proposed perception algorithms into Hydra, a real-time spatial perception system that builds a 3D scene graph from visual-inertial data in real-time. We showcase Hydra\u2019s performance in photo-realistic simulations and real data collected by a Clearpath Jackal robots and a Unitree A1 robot. We release an open-source implementation of Hydra at https:\/\/github.com\/MIT-SPARK\/Hydra . <\/jats:p>","DOI":"10.1177\/02783649241229725","type":"journal-article","created":{"date-parts":[[2024,2,12]],"date-time":"2024-02-12T23:30:46Z","timestamp":1707780646000},"page":"1457-1505","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":49,"title":["Foundations of spatial perception for robotics: Hierarchical representations and real-time systems"],"prefix":"10.1177","volume":"43","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1201-7032","authenticated-orcid":false,"given":"Nathan","family":"Hughes","sequence":"first","affiliation":[{"name":"Laboratory for Information and Decision Systems (LIDS), Massachusetts Institute of Technology, Cambridge, MA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2829-5256","authenticated-orcid":false,"given":"Yun","family":"Chang","sequence":"additional","affiliation":[{"name":"Laboratory for Information and Decision Systems (LIDS), Massachusetts Institute of Technology, Cambridge, MA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8970-3962","authenticated-orcid":false,"given":"Siyi","family":"Hu","sequence":"additional","affiliation":[{"name":"Laboratory for Information and Decision Systems (LIDS), Massachusetts Institute of Technology, Cambridge, MA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6132-395X","authenticated-orcid":false,"given":"Rajat","family":"Talak","sequence":"additional","affiliation":[{"name":"Laboratory for Information and Decision Systems (LIDS), Massachusetts Institute of Technology, Cambridge, MA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-9834-3926","authenticated-orcid":false,"given":"Rumaia","family":"Abdulhai","sequence":"additional","affiliation":[{"name":"Laboratory for Information and Decision Systems (LIDS), Massachusetts Institute of Technology, Cambridge, MA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3978-9542","authenticated-orcid":false,"given":"Jared","family":"Strader","sequence":"additional","affiliation":[{"name":"Laboratory for Information and Decision Systems (LIDS), Massachusetts Institute of Technology, Cambridge, MA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1884-5397","authenticated-orcid":false,"given":"Luca","family":"Carlone","sequence":"additional","affiliation":[{"name":"Laboratory for Information and Decision Systems (LIDS), Massachusetts Institute of Technology, Cambridge, MA, USA"}]}],"member":"179","published-online":{"date-parts":[[2024,2,12]]},"reference":[{"key":"bibr1-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2017.12.004"},{"key":"bibr2-02783649241229725","unstructured":"Agia C, Jatavallabhula KM, Khodeir M, et al. (2022) Taskography: evaluating robot task planning over large 3D scene graphs. Conference on Robot Learning (CoRL), Auckland, New Zealand, 14\u201318 December 2022."},{"key":"bibr3-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1007\/s41109-019-0179-3"},{"key":"bibr4-02783649241229725","volume-title":"A Survey of Vectorization Methods in Topological Data Analysis","author":"Ali D","year":"2022"},{"key":"bibr5-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2022.3230590"},{"key":"bibr6-02783649241229725","doi-asserted-by":"crossref","unstructured":"Anderson P, Fernando B, Johnson M, et al. (2016) Spice: semantic propositional image caption evaluation. European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11\u201313 October 2016.","DOI":"10.1007\/978-3-319-46454-1_24"},{"key":"bibr7-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2021.3094984"},{"key":"bibr8-02783649241229725","doi-asserted-by":"crossref","unstructured":"Arandjelovic R, Gronat P, Torii A, et al. (2016) NetVLAD: CNN architecture for weakly supervised place recognition IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27\u201330 June 2016.","DOI":"10.1109\/CVPR.2016.572"},{"key":"bibr9-02783649241229725","doi-asserted-by":"crossref","unstructured":"Armeni I, He Z, Gwak J, et al. (2019) 3D scene graph: a structure for unified semantics, 3D space, and camera International Conference on Computer Vision (ICCV), Seoul, Korea, 2 November 2019.","DOI":"10.1109\/ICCV.2019.00576"},{"key":"bibr10-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-76298-0_52"},{"key":"bibr11-02783649241229725","doi-asserted-by":"crossref","unstructured":"Bavle H, Sanchez-Lopez JL, Shaheer M, et al. (2022a) S-graphs+: real-time localization and mapping leveraging hierarchical representations. arXiv preprint arXiv:2212.11770.","DOI":"10.1109\/LRA.2023.3290512"},{"key":"bibr12-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2022.3189785"},{"key":"bibr13-02783649241229725","unstructured":"Becker A, Geiger D (1996) A sufficiently fast algorithm for finding close to optimal junction trees. Conference on Uncertainty in Artificial Intelligence (UAI), Portland, OR, 1\u20134 August 1996."},{"key":"bibr14-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1177\/0278364909100586"},{"key":"bibr15-02783649241229725","doi-asserted-by":"crossref","unstructured":"Beetz M, Be\u00dfler D, Haidu A, et al. (2018) KnowRob 2.0\u2014a 2nd generation knowledge processing framework for cognition-enabled robotic agents. 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21\u201325 May 2018.","DOI":"10.1109\/ICRA.2018.8460964"},{"key":"bibr16-02783649241229725","doi-asserted-by":"crossref","unstructured":"Behley J, Garbade M, Milioto A, et al. (2019) SemanticKITTI: a dataset for semantic scene understanding of LiDAR sequences. International Conference on Computer Vision (ICCV), Seoul, Korea, 2 November 2019.","DOI":"10.1109\/ICCV.2019.00939"},{"key":"bibr17-02783649241229725","doi-asserted-by":"crossref","unstructured":"Berg M, Konidaris G, Tellex S (2022) Using language to generate state abstractions for long-range planning in outdoor environments. In: IEEE International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23\u201327 May 2022.","DOI":"10.1109\/ICRA46639.2022.9812355"},{"key":"bibr18-02783649241229725","volume-title":"A library for nearest neighbor (NN) with kd-trees","author":"Blanco JL","year":"2014"},{"key":"bibr19-02783649241229725","first-page":"105","volume":"3","author":"Bodlaender HL","year":"1988","journal-title":"Automata, Languages and Programming"},{"key":"bibr20-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1007\/11917496_1"},{"key":"bibr21-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1016\/j.ic.2009.03.008"},{"key":"bibr22-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1016\/j.ic.2011.04.003"},{"key":"bibr23-02783649241229725","doi-asserted-by":"crossref","unstructured":"Bollacker K, Evans C, Paritosh P, et al. (2008) Freebase: a collaboratively created graph database for structuring human knowledge. Proceedings of the ACM SIGMOD International Conference on Management of Data, Houston, TX, USA, 10\u201315 June 2008.","DOI":"10.1145\/1376616.1376746"},{"key":"bibr24-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2016.7487234"},{"key":"bibr25-02783649241229725","volume-title":"Construction of Engineering Ontologies for Knowledge Sharing and Reuse","author":"Borst WN","year":"1997"},{"key":"bibr26-02783649241229725","doi-asserted-by":"crossref","unstructured":"Bowman S, Atanasov N, Daniilidis K, et al. (2017) Probabilistic data association for semantic SLAM. IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May 2017.","DOI":"10.1109\/ICRA.2017.7989203"},{"key":"bibr27-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1109\/MSP.2017.2693418"},{"key":"bibr28-02783649241229725","volume-title":"Relational Graph Attention Networks","author":"Busbridge D","year":"2019"},{"key":"bibr29-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2016.2624754"},{"key":"bibr30-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v24i1.7519"},{"key":"bibr31-02783649241229725","unstructured":"Chandrasekaran V, Srebro N, Harsha P (2008) Complexity of inference in graphical models. Conference on Uncertainty in Artificial Intelligence (UAI), Helsinki, Finland, 9\u201312 July 2008."},{"key":"bibr32-02783649241229725","doi-asserted-by":"crossref","unstructured":"Chang A, Dai A, Funkhouser T, et al. (2017) Matterport3d: learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV), Qingdao, China, 10\u201312 October 2017.","DOI":"10.1109\/3DV.2017.00081"},{"key":"bibr33-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2022.3191204"},{"key":"bibr34-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2021.3137605"},{"key":"bibr35-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2023.3320011"},{"key":"bibr36-02783649241229725","doi-asserted-by":"crossref","unstructured":"Chatila R, Laumond JP (1985) Position referencing and consistent world modeling for mobile robots. IEEE International Conference on Robotics and Automation (ICRA), St. Louis, Missouri, USA, 25\u201328 March 1985.","DOI":"10.1109\/ROBOT.1985.1087373"},{"key":"bibr37-02783649241229725","doi-asserted-by":"crossref","unstructured":"Chen H, Tan H, Kuntz A, et al. (2020) Enabling robots to understand incomplete natural language instructions using commonsense reasoning. In: IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 August 2020.","DOI":"10.1109\/ICRA40945.2020.9197315"},{"key":"bibr38-02783649241229725","volume-title":"Leveraging Large (Visual) Language Models for Robot 3d Scene Understanding","author":"Chen W","year":"2022"},{"key":"bibr39-02783649241229725","doi-asserted-by":"crossref","unstructured":"Chen Z, Rezayi S, Li S (2023) More knowledge, less bias: unbiasing scene graph generation with explicit ontological adjustment. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 2\u20137 January 2023.","DOI":"10.1109\/WACV56688.2023.00401"},{"key":"bibr40-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1109\/70.928558"},{"key":"bibr41-02783649241229725","unstructured":"Chua J (2018) Probabilistic Scene Grammars: A General-Purpose Framework for Scene Understanding. Providence, RI: Brown University Thesis, 1\u2013146."},{"key":"bibr42-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1016\/0004-3702(90)90060-D"},{"key":"bibr43-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2020.2965415"},{"key":"bibr44-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1145\/3072959.3054739"},{"key":"bibr45-02783649241229725","doi-asserted-by":"crossref","unstructured":"Daruna A, Nair L, Liu W, et al. (2021) Towards robust one-shot task execution using knowledge graph embeddings. IEEE International Conference on Robotics and Automation (ICRA). Yokohama, Japan, 5 June 2021.","DOI":"10.1109\/ICRA48506.2021.9561782"},{"key":"bibr46-02783649241229725","volume-title":"FutureMapping: The Computational Structure of Spatial AI Systems","author":"Davison AJ","year":"2018"},{"key":"bibr47-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1016\/j.artint.2006.11.003"},{"key":"bibr48-02783649241229725","first-page":"3844","volume":"29","author":"Defferrard M","year":"2016","journal-title":"Advances in Neural Information Processing Systems"},{"key":"bibr49-02783649241229725","doi-asserted-by":"publisher","DOI":"10.3390\/s19051166"},{"key":"bibr50-02783649241229725","doi-asserted-by":"crossref","unstructured":"Ding Y, Yu J, Liu B, et al. (2022) MuKEA: multimodal knowledge extraction and accumulation for knowledge-based visual question answering. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18 June 2022.","DOI":"10.1109\/CVPR52688.2022.00503"},{"key":"bibr51-02783649241229725","doi-asserted-by":"crossref","unstructured":"Dong J, Fei X, Soatto S (2017) Visual-Inertial-Semantic scene representation for 3D object detection. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21 July 2017.","DOI":"10.1109\/CVPR.2017.380"},{"key":"bibr52-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1145\/167088.167245"},{"key":"bibr53-02783649241229725","unstructured":"Fey M, Lenssen JE (2019) Fast graph representation learning with PyTorch Geometric. International Conference on Learning Representations (ICLR) Workshop on Representation Learning on Graphs and Manifolds, Eindhoven, The Netherlands, 6 March 2019."},{"key":"bibr54-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1016\/0010-0277(88)90031-5"},{"key":"bibr55-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1115\/1.1631582"},{"key":"bibr56-02783649241229725","first-page":"2109","volume-title":"International Joint Conference On AI (IJCAI)","author":"Friedman S","year":"2007"},{"key":"bibr57-02783649241229725","doi-asserted-by":"crossref","unstructured":"Furukawa Y, Curless B, Seitz SM, et al. (2009) Reconstructing building interiors from images. International Conference on Computer Vision (ICCV), Kyoto, Japan, 2 October 2009.","DOI":"10.1109\/ICCV.2009.5459145"},{"key":"bibr58-02783649241229725","doi-asserted-by":"crossref","unstructured":"Galindo C, Saffiotti A, Coradeschi S, et al. (2005) Multi-hierarchical semantic maps for mobile robotics. IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Edmonton, AB, Canada, 2\u20136 August 2005.","DOI":"10.1109\/IROS.2005.1545511"},{"key":"bibr59-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2012.2197158"},{"key":"bibr60-02783649241229725","volume-title":"A Review on Deep Learning Techniques Applied to Semantic Segmentation","author":"Garcia-Garcia A","year":"2017"},{"key":"bibr61-02783649241229725","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.findings-emnlp.44"},{"key":"bibr62-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2021.3067633"},{"key":"bibr63-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2018.2801879"},{"key":"bibr64-02783649241229725","first-page":"330","volume-title":"Asian Conference On Computer Vision (ACCV)","author":"Gay P","year":"2018"},{"key":"bibr65-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1090\/qam\/1939008"},{"key":"bibr66-02783649241229725","volume-title":"Logical Foundations of Artificial Intelligence","author":"Genesereth MR","year":"2012"},{"key":"bibr67-02783649241229725","volume-title":"3DP3: 3D Scene Perception via Probabilistic Programming","author":"Gothoskar N","year":"2021"},{"key":"bibr68-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2019.2923960"},{"key":"bibr69-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1145\/3382082"},{"key":"bibr70-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1006\/ijhc.1995.1081"},{"key":"bibr71-02783649241229725","first-page":"1","volume":"1","author":"Guarino N","year":"2009","journal-title":"Handbook on ontologies"},{"key":"bibr72-02783649241229725","doi-asserted-by":"crossref","unstructured":"Guo Y, Gao L, Wang X, et al. (2021) From general to specific: informative scene graph generation via balance adjustment. International Conference on Computer Vision (ICCV), Montreal, Canada, 17 October 2021.","DOI":"10.1109\/ICCV48922.2021.01607"},{"key":"bibr73-02783649241229725","unstructured":"Ha H, Song S (2022) Semantic abstraction: open-world 3d scene understanding from 2d vision-language models. 6th Annual Conference on Robot Learning, Auckland, New Zealand, 14\u201318 December 2022."},{"key":"bibr74-02783649241229725","unstructured":"Hamilton WL, Ying R, Leskovec J (2017) Inductive representation learning on large graphs. Advances in Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4\u20139 December 2017."},{"key":"bibr75-02783649241229725","doi-asserted-by":"crossref","unstructured":"Hao J, Chen M, Yu W, et al. (2019) Universal representation learning of knowledge bases by jointly embedding instances and ontological concepts. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Anchorage, AK, USA, 4\u20138 August 2019.","DOI":"10.1145\/3292500.3330838"},{"key":"bibr76-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1016\/0167-2789(90)90087-6"},{"key":"bibr77-02783649241229725","volume-title":"Deep Convolutional Networks on Graph-Structured Data","author":"Henaff M","year":"2015"},{"key":"bibr78-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-658-32182-6_13"},{"key":"bibr79-02783649241229725","doi-asserted-by":"crossref","unstructured":"Hughes N, Chang Y, Carlone L (2022) Hydra: a real-time spatial perception engine for 3D scene graph construction and optimization. Robotics: science and systems (RSS), New York City, 27 June 2022.","DOI":"10.15607\/RSS.2022.XVIII.050"},{"key":"bibr80-02783649241229725","volume":"43","author":"Ichien N","year":"2021","journal-title":"Annual Meeting of the Cognitive Science Society"},{"key":"bibr81-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA40945.2020.9196910"},{"key":"bibr82-02783649241229725","volume-title":"Scene understanding and distribution modeling with mixed-integer scene parsing","author":"Izatt G","year":"2021"},{"key":"bibr83-02783649241229725","doi-asserted-by":"crossref","unstructured":"Jain J, Li J, Chiu M, et al. (2023) OneFormer: one transformer to rule universal image segmentation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17 June 2023.","DOI":"10.1109\/CVPR52729.2023.00292"},{"key":"bibr84-02783649241229725","unstructured":"James S, Rosman B, Konidaris G (2020) Learning portable representations for high-level planning. International Conference on Machine Learning (ICML), Vienna, Austria, 18 Jul 2020."},{"key":"bibr85-02783649241229725","unstructured":"James S, Rosman B, Konidaris G (2022) Autonomous learning of object-centric abstractions for high-level planning. International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 29 April 2022."},{"key":"bibr86-02783649241229725","volume-title":"ConceptFusion: Open-Set Multimodal 3D Mapping","author":"Jatavallabhula KM","year":"2023"},{"key":"bibr87-02783649241229725","doi-asserted-by":"crossref","unstructured":"Jensen FV, Jensen F (1994) Optimal junction trees. Conference on Uncertainty in Artificial Intelligence (UAI), Seattle, Washington, USA, 29\u201331 July 1994.","DOI":"10.1016\/B978-1-55860-332-5.50050-X"},{"key":"bibr88-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1109\/MITP.2009.105"},{"key":"bibr89-02783649241229725","unstructured":"Jinnai Y, Abel D, Hershkowitz D, et al. (2019) Finding options that minimize planning time. International Conference on Machine Learning (ICML), Long Beach, CA, USA, 15 June 2019."},{"key":"bibr90-02783649241229725","doi-asserted-by":"crossref","unstructured":"Johnson J, Krishna R, Stark M, et al. (2015) Image retrieval using scene graphs. IEEE Conference on Computer Vision And Pattern Recognition (CVPR), Boston, MA, USA, 7\u201312 June 2015.","DOI":"10.1109\/CVPR.2015.7298990"},{"key":"bibr91-02783649241229725","volume-title":"An Introduction to Probabilistic Graphical Models","author":"Jordan M","year":"2002"},{"key":"bibr92-02783649241229725","doi-asserted-by":"crossref","unstructured":"Karpathy A, Fei-Fei L (2015) Deep visual-semantic alignments for generating image descriptions. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7\u201312 June 2015.","DOI":"10.1109\/CVPR.2015.7298932"},{"issue":"12","key":"bibr93-02783649241229725","first-page":"1","volume":"50","author":"Kim U","year":"2019","journal-title":"IEEE Transactions on Cybernetics"},{"key":"bibr94-02783649241229725","unstructured":"Kipf T, Welling M (2017) Semi-supervised classification with graph convolutional networks. International Conference on Learning Representations (ICLR), Toulon, France, 24\u201326 April 2017."},{"key":"bibr95-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2017.8206429"},{"key":"bibr96-02783649241229725","volume-title":"Probabilistic Graphical Models: Principles and Techniques","author":"Koller D","year":"2009"},{"key":"bibr97-02783649241229725","volume-title":"vMAP: Vectorised Object Mapping for Neural Field SLAM","author":"Kong X","year":"2023"},{"key":"bibr98-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1016\/j.cobeha.2018.11.005"},{"key":"bibr99-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1613\/jair.5575"},{"key":"bibr100-02783649241229725","volume-title":"Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations","author":"Krishna R","year":"2016"},{"key":"bibr101-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1207\/s15516709cog0202_3"},{"key":"bibr102-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1016\/S0004-3702(00)00017-5"},{"key":"bibr103-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2022.3191194"},{"key":"bibr104-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1017\/S0140525X16001837"},{"key":"bibr105-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2012.08.010"},{"key":"bibr106-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1145\/3363574"},{"key":"bibr107-02783649241229725","doi-asserted-by":"crossref","unstructured":"Lemaignan S, Ros R, M\u00f6senlechner L, et al. (2010) ORO, a knowledge management platform for cognitive architectures in robotics. IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Taipei, Taiwan, 18\u201324 October 2010.","DOI":"10.1109\/IROS.2010.5649547"},{"key":"bibr108-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1145\/219717.219745"},{"key":"bibr109-02783649241229725","doi-asserted-by":"crossref","unstructured":"Li C, Xiao H, Tateno K, et al. (2016) Incremental scene understanding on dense SLAM. IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Korea, 9 October 2016.","DOI":"10.1109\/IROS.2016.7759111"},{"key":"bibr110-02783649241229725","doi-asserted-by":"crossref","unstructured":"Li Y, Ouyang W, Zhou B, et al. (2017) Scene graph generation from objects, phrases and region captions. International Conference on Computer Vision (ICCV), Venice, Italy, 29 October 2017.","DOI":"10.1109\/ICCV.2017.142"},{"key":"bibr111-02783649241229725","unstructured":"Li Y, Gu C, Dullien T, et al. (2019) Graph matching networks for learning the similarity of graph structured objects. International Conference on Machine Learning (ICML), Long Beach, CA, USA, 15 June 2019."},{"key":"bibr112-02783649241229725","doi-asserted-by":"crossref","unstructured":"Lianos K, Sch\u00f6nberger J, Pollefeys M, et al. (2018) Vso: visual semantic odometry. European Conference on Computer Vision (ECCV), Munich, Germany, 8\u201314 September 2018.","DOI":"10.1007\/978-3-030-01225-0_15"},{"key":"bibr113-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2021.3097242"},{"key":"bibr114-02783649241229725","doi-asserted-by":"crossref","unstructured":"Liu C, Wu J, Furukawa Y (2018) FloorNet: a unified framework for floorplan reconstruction from 3d scans. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8\u201314 September 2018.","DOI":"10.1007\/978-3-030-01231-1_13"},{"key":"bibr115-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2019.8794475"},{"key":"bibr116-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA48891.2023.10161212"},{"key":"bibr117-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2015.2496823"},{"key":"bibr118-02783649241229725","doi-asserted-by":"crossref","unstructured":"Lu C, Krishna R, Bernstein M, et al. (2016) Visual relationship detection with language priors. European Conference on Computer Vision, Amsterdam, The Netherlands, 16 September 2016.","DOI":"10.1007\/978-3-319-46448-0_51"},{"key":"bibr119-02783649241229725","doi-asserted-by":"crossref","unstructured":"Lukierski R, Leutenegger S, Davison AJ (2017) Room layout estimation from rapid omnidirectional exploration. IEEE International Conference on Robotics and Automation (ICRA), Singapore, 3 June 2017.","DOI":"10.1109\/ICRA.2017.7989747"},{"key":"bibr120-02783649241229725","unstructured":"Maniu S, Senellart P, Jog S (2019) An experimental study of the treewidth of real-world graph data. International Conference Database Theory, Edinburgh, UK, 26\u201329 March 2019."},{"key":"bibr121-02783649241229725","doi-asserted-by":"crossref","unstructured":"Marino K, Chen X, Parikh D, et al. (2021) KRISP: integrating implicit and symbolic knowledge for open-domain knowledge-based VQA. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, 19\u201325 June 2021.","DOI":"10.1109\/CVPR46437.2021.01389"},{"key":"bibr122-02783649241229725","doi-asserted-by":"crossref","unstructured":"McCormac J, Handa A, Davison AJ, et al. (2017) SemanticFusion: dense 3D semantic mapping with convolutional neural networks. IEEE International Conference on Robotics and Automation (ICRA), Singapore, 3 June 2017.","DOI":"10.1109\/ICRA.2017.7989538"},{"key":"bibr123-02783649241229725","doi-asserted-by":"crossref","unstructured":"McCormac J, Clark R, Bloesch M, et al. (2018) Fusion++: volumetric object-level SLAM. International Conference on 3D Vision (3DV), Verona, Italy, 5\u20138 September 2018.","DOI":"10.1109\/3DV.2018.00015"},{"key":"bibr124-02783649241229725","volume-title":"OWL Web Ontology Language Overview","author":"McGuinness D","year":"2004"},{"key":"bibr125-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1996.8.1.164"},{"key":"bibr126-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1142\/S0219530516400042"},{"key":"bibr127-02783649241229725","volume-title":"Efficient Estimation of Word Representations in Vector Space","author":"Mikolov T","year":"2013"},{"key":"bibr128-02783649241229725","doi-asserted-by":"crossref","unstructured":"Milford M, Wyeth G (2012) Seqslam: visual route-based navigation for sunny summer days and stormy winter nights. IEEE International Conference on Robotics and Automation (ICRA), St Paul, Minnesota, USA, 14\u201318 May 2012.","DOI":"10.1109\/ICRA.2012.6224623"},{"key":"bibr129-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1145\/219717.219748"},{"key":"bibr130-02783649241229725","doi-asserted-by":"crossref","unstructured":"Mo K, Guerrero P, Yi L, et al. (2020) StructEdit: learning structural shape variations. IEEe Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13\u201319 June 2020.","DOI":"10.1109\/CVPR42600.2020.00888"},{"key":"bibr131-02783649241229725","doi-asserted-by":"crossref","unstructured":"Movshovitz-Attias Y, Yu Q, Stumpe MC, et al. (2015) Ontological supervision for fine grained classification of street view storefronts. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7 June 2015.","DOI":"10.1109\/CVPR.2015.7298778"},{"key":"bibr132-02783649241229725","doi-asserted-by":"crossref","unstructured":"Narita G, Seno T, Ishikawa T, et al. (2019) Panopticfusion: online volumetric semantic mapping at the level of stuff and things. IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), The Venetian Macau, Macau, China, 4\u20138 September 2019.","DOI":"10.1109\/IROS40897.2019.8967890"},{"key":"bibr133-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2018.2866205"},{"key":"bibr134-02783649241229725","doi-asserted-by":"crossref","unstructured":"Niemeyer M, Geiger A (2021) GIRAFFE: representing scenes as compositional generative neural feature fields. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19\u201325 June 2021.","DOI":"10.1109\/CVPR46437.2021.01129"},{"key":"bibr135-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1145\/2508363.2508374"},{"key":"bibr136-02783649241229725","doi-asserted-by":"crossref","unstructured":"Niles I, Pease A (2001) Towards a standard upper ontology. Proceedings of the International Conference on Formal Ontology in Information Systems, Ogunquit, Maine, USA, 17\u201319 October 2001.","DOI":"10.1145\/505168.505170"},{"key":"bibr137-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA48506.2021.9561225"},{"key":"bibr138-02783649241229725","doi-asserted-by":"crossref","unstructured":"Oleynikova H, Taylor Z, Fehr M, et al. (2017) Voxblox: incremental 3d euclidean signed distance fields for on-board mav planning. IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, Canada, 24\u201328 September 2017.","DOI":"10.1109\/IROS.2017.8202315"},{"key":"bibr139-02783649241229725","doi-asserted-by":"crossref","unstructured":"Oleynikova H, Taylor Z, Siegwart R, et al. (2018) Sparse 3D topological graphs for micro-aerial vehicle planning. IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1\u20135 October 2018.","DOI":"10.1109\/IROS.2018.8594152"},{"key":"bibr140-02783649241229725","doi-asserted-by":"crossref","unstructured":"Park J, Florence P, Straub J, et al. (2019) DeepSDF: learning continuous signed distance functions for shape representation. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15\u201320 June 2019.","DOI":"10.1109\/CVPR.2019.00025"},{"key":"bibr141-02783649241229725","volume":"32","author":"Paszke A","year":"2019","journal-title":"Advances in Neural Information Processing Systems"},{"key":"bibr142-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1007\/s11633-017-1054-2"},{"key":"bibr143-02783649241229725","volume":"15","author":"Porello D","year":"2015","journal-title":"Workshop on Neural Cognitive Integration"},{"key":"bibr144-02783649241229725","doi-asserted-by":"crossref","unstructured":"Qi S, Zhu Y, Huang S, et al. (2018) Human-centric indoor scene synthesis using stochastic grammar. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18\u201323 June 2018.","DOI":"10.1109\/CVPR.2018.00618"},{"key":"bibr145-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1016\/j.jvcir.2021.103072"},{"key":"bibr146-02783649241229725","unstructured":"Rana K, Haviland J, Garg S, et al. (2023) SayPlan: grounding large language models using 3d scene graphs for scalable task planning. 7th Annual Conference on Robot Learning, Atlanta, USA, 18 January 2023."},{"key":"bibr147-02783649241229725","doi-asserted-by":"crossref","unstructured":"Ravichandran Z, Peng L, Hughes N, et al. (2022) Hierarchical representations and explicit memory: learning effective navigation policies on 3D scene graphs using graph neural networks. IEEE International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23\u201327 May 2022.","DOI":"10.1109\/ICRA46639.2022.9812179"},{"key":"bibr148-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2019.2953859"},{"key":"bibr149-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2022.3181357"},{"key":"bibr150-02783649241229725","volume-title":"Image Question Answering: A Visual Semantic Embedding Model and a New Dataset","author":"Ren M","year":"2015"},{"key":"bibr151-02783649241229725","doi-asserted-by":"crossref","unstructured":"Rosinol A, Abate M, Chang Y, et al. (2020a) Kimera: an open-source library for real-time metric-semantic localization and mapping. IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 August 2020.","DOI":"10.1109\/ICRA40945.2020.9196885"},{"key":"bibr152-02783649241229725","doi-asserted-by":"crossref","unstructured":"Rosinol A, Gupta A, Abate M, et al. (2020b) 3D dynamic scene graphs: actionable spatial perception with places, objects, and humans. Robotics: Science and Systems (RSS), Daegu, Republic of Korea, 12\u201316 July 2020. https:\/\/news.mit.edu\/2020\/robots-spatial-perception-0715","DOI":"10.15607\/RSS.2020.XVI.079"},{"key":"bibr153-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1177\/02783649211056674"},{"key":"bibr154-02783649241229725","doi-asserted-by":"crossref","unstructured":"Rosinol A, Leonard J, Carlone L (2023) NeRF-SLAM: real-time dense monocular SLAM with neural radiance fields. IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, Michigan, USA, 1\u20135 October 2023.","DOI":"10.1109\/IROS55552.2023.10341922"},{"key":"bibr155-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-019-01187-z"},{"key":"bibr156-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2016.12.016"},{"key":"bibr157-02783649241229725","volume-title":"Semantic 3D Object Maps for Everyday Manipulation in Human Living Environments","author":"Rusu RB","year":"2009"},{"key":"bibr158-02783649241229725","doi-asserted-by":"crossref","unstructured":"Salas-Moreno RF, Newcombe RA, Strasdat H, et al. (2013) SLAM++: simultaneous localisation and mapping at the level of objects. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA, 23\u201328 June 2013.","DOI":"10.1109\/CVPR.2013.178"},{"key":"bibr159-02783649241229725","doi-asserted-by":"crossref","unstructured":"Sandler M, Howard A, Zhu M, et al. (2018) Mobilenetv2: inverted residuals and linear bottlenecks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18\u201323 June 2018.","DOI":"10.1109\/CVPR.2018.00474"},{"key":"bibr160-02783649241229725","doi-asserted-by":"crossref","unstructured":"Savva M, Kadian A, Maksymets O, et al. (2019) Habitat: a platform for embodied AI research. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October 2019.","DOI":"10.1109\/ICCV.2019.00943"},{"key":"bibr161-02783649241229725","doi-asserted-by":"crossref","unstructured":"Schlenoff C, Prestes E, Madhavan R, et al. (2012) An IEEE standard ontology for robotics and automation. IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Vilamoura, Algarve, 7\u201312 October 2012.","DOI":"10.1109\/IROS.2012.6385518"},{"key":"bibr162-02783649241229725","volume-title":"Panoptic Multi-Tsdfs: A Flexible Representation for Online Multi-Resolution Volumetric Mapping and Long-Term Dynamic Scene Consistency","author":"Schmid L","year":"2021"},{"key":"bibr163-02783649241229725","doi-asserted-by":"crossref","unstructured":"Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7\u201312 June 2015.","DOI":"10.1109\/CVPR.2015.7298682"},{"key":"bibr164-02783649241229725","doi-asserted-by":"crossref","unstructured":"Schubert S, Neubert P, Protzel P (2021) Fast and memory efficient graph optimization via ICM for visual place recognition. Proceeding of Robotics: Science and Systems (RSS), New York City, NY, USA, 12\u201316 July 2021.","DOI":"10.15607\/RSS.2021.XVII.091"},{"key":"bibr165-02783649241229725","doi-asserted-by":"crossref","unstructured":"Shan M, Feng Q, Atanasov N (2020) Object residual constrained visual-inertial odometry. IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, Nevada, USA, 25\u201329 October 2020.","DOI":"10.1109\/IROS45743.2020.9341660"},{"key":"bibr166-02783649241229725","doi-asserted-by":"crossref","unstructured":"Shi J, Talak R, Maggio D, et al. (2023) A correct-and-certify approach to self-supervise object pose estimators via ensemble self-training. Robotics: Science and Systems (RSS), Daegu, Republic of Korea, 14 July 2023.","DOI":"10.15607\/RSS.2023.XIX.076"},{"key":"bibr167-02783649241229725","volume-title":"Beyond Concepts: Ontology as Reality Representation","author":"Smith B","year":"2004"},{"key":"bibr168-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v31i1.11164"},{"key":"bibr169-02783649241229725","volume-title":"MonteFloor: Extending MCTS for Reconstructing Accurate Large-Scale Floor Plans","author":"Stekovic S","year":"2021"},{"key":"bibr170-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1016\/j.jvcir.2013.02.008"},{"key":"bibr171-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1016\/S0169-023X(97)00056-6"},{"key":"bibr172-02783649241229725","doi-asserted-by":"crossref","unstructured":"Sucar E, Wada K, Davison A (2020) NodeSLAM: neural object descriptors for multi-view shape reconstruction. 2020 International Conference on 3D Vision (3DV), Fukuoka, Japan, 25\u201328 November 2020.","DOI":"10.1109\/3DV50981.2020.00105"},{"key":"bibr173-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1016\/j.websem.2008.06.001"},{"key":"bibr174-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1145\/1275808.1276478"},{"key":"bibr175-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1111\/cgf.12865"},{"key":"bibr176-02783649241229725","unstructured":"Talak R, Hu S, Peng L, et al. (2021) Neural trees for learning on graphs. Conference on Neural Information Processing Systems (NeurIPS), Canada, 6\u201314 December 2021."},{"key":"bibr177-02783649241229725","doi-asserted-by":"crossref","unstructured":"Tateno K, Tombari F, Navab N (2015) Real-time and scalable incremental segmentation on dense SLAM. IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 28 September 2015.","DOI":"10.1109\/IROS.2015.7354011"},{"key":"bibr178-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1177\/0278364913481635"},{"issue":"4","key":"bibr179-02783649241229725","doi-asserted-by":"crossref","first-page":"930","DOI":"10.1198\/jcgs.2009.07129","volume":"18","author":"Thomas A","year":"2009","journal-title":"Journal of Computational & Graphical Statistics: A Joint Publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America"},{"key":"bibr180-02783649241229725","first-page":"1","volume-title":"Exploring Artificial Intelligence in the New Millennium","author":"Thrun S","year":"2003"},{"key":"bibr181-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1613\/jair.1.13791"},{"key":"bibr182-02783649241229725","unstructured":"Veli\u010dkovi\u0107 P, Cucurull G, Casanova A, et al. (2018) Graph attention networks. International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 3 May 2018."},{"key":"bibr183-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1145\/2629489"},{"key":"bibr184-02783649241229725","doi-asserted-by":"crossref","unstructured":"Wald J, Dhamo H, Navab N, et al. (2020) Learning 3D semantic scene graphs from 3D indoor reconstructions. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13\u201319 June 2020.","DOI":"10.1109\/CVPR42600.2020.00402"},{"key":"bibr185-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2020.2983686"},{"issue":"7","key":"bibr186-02783649241229725","first-page":"3508","volume":"44","author":"Wang W","year":"2022","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"bibr187-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1038\/s41562-023-01659-w"},{"key":"bibr188-02783649241229725","unstructured":"Whelan T, McDonald JB, Kaess M, et al. (2012) Kintinuous: spatially extended kinect-fusion. RSS Workshop on RGB-D: advanced reasoning with depth cameras, Sydney, Australia, 12 July 2012."},{"key":"bibr189-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1177\/0278364914551008"},{"key":"bibr190-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1177\/0278364916669237"},{"key":"bibr191-02783649241229725","doi-asserted-by":"crossref","unstructured":"Wu S, Wald J, Tateno K, et al. (2021) SceneGraphFusion: incremental 3D scene graph prediction from RGB-D sequences. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19\u201325 June 2021.","DOI":"10.1109\/CVPR46437.2021.00743"},{"key":"bibr192-02783649241229725","unstructured":"Xie S, Morcos AS, Zhu SC, et al. (2022) COAT: measuring object compositionality in emergent representations. International Conference on Machine Learning (ICML), Baltimore, MA, 17\u201323 July 2022."},{"key":"bibr193-02783649241229725","doi-asserted-by":"crossref","unstructured":"Xu D, Zhu Y, Choy CB, et al. (2017) Scene graph generation by iterative message passing. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21\u201326 July 2017.","DOI":"10.1109\/CVPR.2017.330"},{"key":"bibr194-02783649241229725","volume-title":"MID-fusion: Octree-Based Object-Level Multi-Instance Dynamic SLAM","author":"Xu B","year":"2019"},{"key":"bibr195-02783649241229725","unstructured":"Xu K, Hu W, Leskovec J, et al. (2019b) How powerful are graph neural networks? International Conference on Learning Representations (ICLR), New Orleans, LO, USA, 6\u20139 May 2019."},{"key":"bibr196-02783649241229725","doi-asserted-by":"crossref","unstructured":"Yang J, Lu J, Lee S, et al. (2018) Graph R-CNN for scene graph generation. European Conference on Computer Vision (ECCV), Munich, Germany, 8\u201314 September 2018.","DOI":"10.1007\/978-3-030-01246-5_41"},{"key":"bibr197-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2020.3033695"},{"key":"bibr198-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2023.3286184"},{"key":"bibr199-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58592-1_36"},{"key":"bibr200-02783649241229725","doi-asserted-by":"crossref","unstructured":"Zellers R, Yatskar M, Thomson S, et al. (2017) Neural motifs: scene graph parsing with global context. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21\u201326 July 2017.","DOI":"10.1109\/CVPR.2018.00611"},{"key":"bibr201-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2008.03.007"},{"key":"bibr202-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1016\/j.gmod.2012.09.002"},{"key":"bibr203-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2020.3042881"},{"key":"bibr204-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2021.108153"},{"key":"bibr205-02783649241229725","doi-asserted-by":"crossref","unstructured":"Zhou B, Zhao H, Puig X, et al. (2017) Scene parsing through ade20k dataset. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21\u201326 July 2017.","DOI":"10.1109\/CVPR.2017.544"},{"key":"bibr206-02783649241229725","volume-title":"Computer Vision: Stochastic Grammars for Parsing Objects, Scenes, and Events","author":"Zhu SC","year":"2021"},{"key":"bibr207-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1561\/0600000018"},{"key":"bibr208-02783649241229725","doi-asserted-by":"publisher","DOI":"10.1007\/s10851-011-0282-2"},{"key":"bibr209-02783649241229725","volume-title":"Scene Graph Generation: A Comprehensive Survey","author":"Zhu G","year":"2022"}],"container-title":["The International Journal of Robotics Research"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/02783649241229725","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/02783649241229725","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/02783649241229725","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,1]],"date-time":"2025-03-01T20:25:27Z","timestamp":1740860727000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/02783649241229725"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,2,12]]},"references-count":209,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2024,9]]}},"alternative-id":["10.1177\/02783649241229725"],"URL":"https:\/\/doi.org\/10.1177\/02783649241229725","relation":{},"ISSN":["0278-3649","1741-3176"],"issn-type":[{"value":"0278-3649","type":"print"},{"value":"1741-3176","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,2,12]]}}}