{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,16]],"date-time":"2026-03-16T16:35:02Z","timestamp":1773678902195,"version":"3.50.1"},"reference-count":59,"publisher":"SAGE Publications","issue":"1","license":[{"start":{"date-parts":[[2012,11,15]],"date-time":"2012-11-15T00:00:00Z","timestamp":1352937600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of Robotics Research"],"published-print":{"date-parts":[[2013,1]]},"abstract":"<jats:p> RGB-D cameras, which give an RGB image together with depths, are becoming increasingly popular for robotic perception. In this paper, we address the task of detecting commonly found objects in the three-dimensional (3D) point cloud of indoor scenes obtained from such cameras. Our method uses a graphical model that captures various features and contextual relations, including the local visual appearance and shape cues, object co-occurrence relationships and geometric relationships. With a large number of object classes and relations, the model\u2019s parsimony becomes important and we address that by using multiple types of edge potentials. We train the model using a maximum-margin learning approach. In our experiments concerning a total of 52 3D scenes of homes and offices (composed from about 550 views), we get a performance of 84.06% and 73.38% in labeling office and home scenes respectively for 17 object classes each. We also present a method for a robot to search for an object using the learned model and the contextual information available from the current labelings of the scene. We applied this algorithm successfully on a mobile robot for the task of finding 12 object classes in 10 different offices and achieved a precision of 97.56% with 78.43% recall.<jats:sup>1<\/jats:sup> <\/jats:p>","DOI":"10.1177\/0278364912461538","type":"journal-article","created":{"date-parts":[[2012,11,16]],"date-time":"2012-11-16T03:17:11Z","timestamp":1353035831000},"page":"19-34","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":128,"title":["Contextually guided semantic labeling and search for three-dimensional point clouds"],"prefix":"10.1177","volume":"32","author":[{"given":"Abhishek","family":"Anand","sequence":"first","affiliation":[{"name":"Department of Computer Science, Cornell University, USA"}]},{"given":"Hema Swetha","family":"Koppula","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Cornell University, USA"}]},{"given":"Thorsten","family":"Joachims","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Cornell University, USA"}]},{"given":"Ashutosh","family":"Saxena","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Cornell University, USA"}]}],"member":"179","published-online":{"date-parts":[[2012,11,15]]},"reference":[{"key":"bibr1-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2005.133"},{"key":"bibr2-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2012.6247920"},{"key":"bibr3-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1016\/S0166-218X(01)00341-9"},{"key":"bibr4-0278364912461538","unstructured":"Collet A, Martinez M, Srinivasa SS (2011) The moped framework: Object recognition and pose estimation for manipulation. International Journal of Robotics Research doi\u2009=\u200910.1177\/0278364911401765, eprint\u2009=\u2009http:\/\/ijr.sagepub.com\/content\/early\/2011\/03\/31\/0278364911401765.full.pdf+html."},{"key":"bibr5-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2011.5980475"},{"key":"bibr6-0278364912461538","first-page":"22","volume-title":"Workshop on Statistical Learning in Computer Vision, European Conference on Computer Vision (ECCV)","author":"Csurka G","year":"2004"},{"key":"bibr7-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2005.177"},{"key":"bibr8-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1109\/34.982896"},{"key":"bibr9-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1109\/ROBOT.2004.1307225"},{"key":"bibr10-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206532"},{"key":"bibr11-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1177\/0278364910373409"},{"key":"bibr12-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2008.4587597"},{"key":"bibr13-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1145\/1390156.1390195"},{"key":"bibr14-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2009.5459471"},{"key":"bibr15-0278364912461538","volume-title":"European Conference on Computer Vision Workshop Multi-camera Multi-modal (M2SFA2)","author":"Gould S","year":"2008"},{"key":"bibr16-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1007\/BF02612354"},{"key":"bibr17-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-15567-3_17"},{"key":"bibr18-0278364912461538","volume-title":"Neural Information Processing Systems (NIPS)","author":"Heitz G","year":"2008"},{"key":"bibr19-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-88682-2_4"},{"key":"bibr20-0278364912461538","unstructured":"Hoiem D, Efros AA, Hebert M (2006) Putting objects in perspective. In Computer Vision and Pattern Recognition."},{"key":"bibr21-0278364912461538","volume-title":"International Joint Conference on Artificial Intelligence (IJCAI)","author":"Jia Z","year":"2011"},{"key":"bibr22-0278364912461538","volume-title":"International Joint Conference on Artificial Intelligence (IJCAI)","author":"Jia Z","year":"2011"},{"key":"bibr23-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1177\/0278364912438781"},{"key":"bibr24-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-009-5108-8"},{"key":"bibr25-0278364912461538","volume-title":"Neural Information Processing Systems (NIPS)","author":"Koppula H","year":"2011"},{"key":"bibr26-0278364912461538","unstructured":"Koppula H. S, Anand A, Joachims T, Saxena A (2011b) Labeling 3d scenes for personal assistant robots. In RSS workshop RGB-D: Advanced Reasoning with Depth Cameras."},{"key":"bibr27-0278364912461538","volume-title":"International Conference on Machine Learning (ICML)","author":"Lafferty J. D","year":"2001"},{"key":"bibr28-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2011.5980382"},{"key":"bibr29-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2011.5980377"},{"key":"bibr30-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1002\/rob.20134"},{"key":"bibr31-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-005-4436-9"},{"key":"bibr32-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR.2004.1334476"},{"key":"bibr33-0278364912461538","volume-title":"Neural Information Processing Systems (NIPS)","author":"Lee DC","year":"2010"},{"key":"bibr34-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2007.383146"},{"key":"bibr35-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2011.232"},{"key":"bibr36-0278364912461538","volume-title":"Neural Information Processing Systems (NIPS)","author":"Li C","year":"2011"},{"key":"bibr37-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1145\/2330163.2330297"},{"key":"bibr38-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1177\/0278364911410090"},{"key":"bibr39-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1109\/ROBOT.2010.5509703"},{"key":"bibr40-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1109\/ROBOT.2009.5152856"},{"key":"bibr41-0278364912461538","volume-title":"Neural Information Processing Systems (NIPS)","author":"Murphy KP","year":"2003"},{"key":"bibr42-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2010.03.005"},{"key":"bibr43-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1109\/ROBOT.2009.5152750"},{"key":"bibr44-0278364912461538","doi-asserted-by":"crossref","unstructured":"Rother C, Kolmogorov V, Lempitsky V, Szummer M (2007) Optimizing binary mrfs via extended roof duality. In Computer Vision and Pattern Recognition.","DOI":"10.1109\/CVPR.2007.383203"},{"key":"bibr45-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2008.08.005"},{"key":"bibr46-0278364912461538","volume-title":"Neural Information Processing Systems (NIPS)","author":"Saxena A","year":"2005"},{"key":"bibr47-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-007-0071-y"},{"key":"bibr48-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2008.132"},{"key":"bibr49-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1109\/3DIMPVT.2011.10"},{"key":"bibr50-0278364912461538","unstructured":"Shapovalov R, Velizhev A, Barinova O (2010) Non-associative markov networks for 3d point cloud classification. In ISPRS Commission III Symposium \u2013 PCV 2010."},{"key":"bibr51-0278364912461538","volume-title":"IEEE International Conference on Robotics and Automation","author":"Sung J","year":"2012"},{"key":"bibr52-0278364912461538","doi-asserted-by":"crossref","unstructured":"Szeliski R, Zabih R, Scharstein D, Veksler O, Kolmogorov V, Agarwala A, Tappen M, Rother C (2008) A comparative study of energy minimization methods for markov random fields with smoothness-based priors. IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1068\u20131080.","DOI":"10.1109\/TPAMI.2007.70844"},{"key":"bibr53-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1145\/1015330.1015444"},{"key":"bibr54-0278364912461538","volume-title":"Neural Information Processing Systems (NIPS)","author":"Taskar B","year":"2003"},{"key":"bibr55-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1023\/A:1023052124951"},{"key":"bibr56-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1145\/1015330.1015341"},{"key":"bibr57-0278364912461538","volume-title":"International Conference on Computer Vision (ICCV)","author":"Xiao J","year":"2009"},{"key":"bibr58-0278364912461538","doi-asserted-by":"publisher","DOI":"10.5244\/C.24.45"},{"key":"bibr59-0278364912461538","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2011.5980125"}],"container-title":["The International Journal of Robotics Research"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0278364912461538","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/0278364912461538","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0278364912461538","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,4]],"date-time":"2025-03-04T05:34:27Z","timestamp":1741066467000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/0278364912461538"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,11,15]]},"references-count":59,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2013,1]]}},"alternative-id":["10.1177\/0278364912461538"],"URL":"https:\/\/doi.org\/10.1177\/0278364912461538","relation":{},"ISSN":["0278-3649","1741-3176"],"issn-type":[{"value":"0278-3649","type":"print"},{"value":"1741-3176","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,11,15]]}}}