{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:16:13Z","timestamp":1750306573980,"version":"3.41.0"},"reference-count":81,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2015,3,31]],"date-time":"2015-03-31T00:00:00Z","timestamp":1427760000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Japan's Ministry of Land, Infrastructure, Transport and Tourism (MLIT)"},{"name":"Microsoft Research"},{"name":"Grant-in-Aid for Young Scientists (23700192) of Japan's Ministry of Education, Culture, Sports, Science, and Technology (MEXT)"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Intell. Syst. Technol."],"published-print":{"date-parts":[[2015,5,4]]},"abstract":"<jats:p>Mining object-level knowledge, that is, building a comprehensive category model base, from a large set of cluttered scenes presents a considerable challenge to the field of artificial intelligence. How to initiate model learning with the least human supervision (i.e., manual labeling) and how to encode the structural knowledge are two elements of this challenge, as they largely determine the scalability and applicability of any solution. In this article, we propose a model-learning method that starts from a single-labeled object for each category, and mines further model knowledge from a number of informally captured, cluttered scenes. However, in these scenes, target objects are relatively small and have large variations in texture, scale, and rotation. Thus, to reduce the model bias normally associated with less supervised learning methods, we use the robust 3D shape in RGB-D images to guide our model learning, then apply the properly trained category models to both object detection and recognition in more conventional RGB images. In addition to model training for their own categories, the knowledge extracted from the RGB-D images can also be transferred to guide model learning for a new category, in which only RGB images without depth information in the new category are provided for training. Preliminary testing shows that the proposed method performs as well as fully supervised learning methods.<\/jats:p>","DOI":"10.1145\/2629701","type":"journal-article","created":{"date-parts":[[2015,4,3]],"date-time":"2015-04-03T20:29:44Z","timestamp":1428092984000},"page":"1-29","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["From RGB-D Images to RGB Images"],"prefix":"10.1145","volume":"6","author":[{"given":"Quanshi","family":"Zhang","sequence":"first","affiliation":[{"name":"University of Tokyo; University of California, Los Angeles, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xuan","family":"Song","sequence":"additional","affiliation":[{"name":"University of Tokyo, Tokyo, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiaowei","family":"Shao","sequence":"additional","affiliation":[{"name":"University of Tokyo, Tokyo, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Huijing","family":"Zhao","sequence":"additional","affiliation":[{"name":"Peking University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ryosuke","family":"Shibasaki","sequence":"additional","affiliation":[{"name":"University of Tokyo, Tokyo, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2015,3,31]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33712-3_37"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2010.161"},{"key":"e_1_2_1_3_1","unstructured":"G. Bradski and T. Hong. 2011. NIST and willow garage: Solution in perception challenge. Retrieved from http:\/\/www.willowgarage.com\/blog\/2011\/02\/28\/nist-and-willow-garage-solutions-perception-challenge.  G. Bradski and T. Hong. 2011. NIST and willow garage: Solution in perception challenge. Retrieved from http:\/\/www.willowgarage.com\/blog\/2011\/02\/28\/nist-and-willow-garage-solutions-perception-challenge."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2011.6130385"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1073204.1073314"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2009.28"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2013.356"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2013.48"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2013.11"},{"volume-title":"IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 398--405","author":"Cho M.","key":"e_1_2_1_10_1"},{"volume-title":"International Conference on Computer Vision and Pattern Recognition (CVPR). 1617--1624","author":"Cho M.","key":"e_1_2_1_11_1"},{"key":"e_1_2_1_12_1","doi-asserted-by":"crossref","unstructured":"A. Collet S. S. Srinivasay and M. Hebert. 2011. Structure discovery in multi-modal data: A region-based approach. In ICRA.  A. Collet S. S. Srinivasay and M. Hebert. 2011. Structure discovery in multi-modal data: A region-based approach. In ICRA.","DOI":"10.1109\/ICRA.2011.5980475"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2005.177"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2011.6126445"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33786-4_35"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-009-0270-9"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2013.421"},{"volume-title":"Retrieved","year":"2015","author":"Helmer S.","key":"e_1_2_1_18_1"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.dam.2002.11.007"},{"volume-title":"IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2653--2660","author":"Hsiao E.","key":"e_1_2_1_20_1"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2013.374"},{"volume-title":"IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2336--2343","year":"2012","author":"Hu W.","key":"e_1_2_1_22_1"},{"volume-title":"Retrieved","year":"2012","author":"Janoch A.","key":"e_1_2_1_23_1"},{"volume-title":"International Conference on Visual Information System. 63--76","year":"2003","author":"Jiang H.","key":"e_1_2_1_24_1"},{"volume-title":"International Conference on Computer Vision and Pattern Recognition (CVPR). 542--549","author":"Joulin A.","key":"e_1_2_1_25_1"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2011.6126314"},{"volume-title":"IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1--8.","author":"Kim G.","key":"e_1_2_1_27_1"},{"volume-title":"International Conference on Computer Vision and Pattern Recognition (CVPR). 837--844","author":"Kim G.","key":"e_1_2_1_28_1"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2011.6126239"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33718-5_20"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2006.200"},{"key":"e_1_2_1_32_1","unstructured":"H. S. Koppula A. Anand T. Joachims and A. Saxena. 2011. Semantic labeling of 3D point clouds for indoor scenes. In Neural Information Processing Systems (NIPS). 244--252.  H. S. Koppula A. Anand T. Joachims and A. Saxena. 2011. Semantic labeling of 3D point clouds for indoor scenes. In Neural Information Processing Systems (NIPS). 244--252."},{"volume-title":"IEEE International Conference on Robotics and Automation (ICRA). 1817--1824","author":"Lai K.","key":"e_1_2_1_33_1"},{"volume-title":"IEEE International Conference on Robotics and Automation (ICRA). 4007--4013","author":"Lai K.","key":"e_1_2_1_34_1"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1177\/0278364910369190"},{"key":"e_1_2_1_36_1","doi-asserted-by":"crossref","unstructured":"Q. Le M. Ranzato R. Monga M. Devin K. Chen G. Corrado J. Dean and A. Ng. 2012. Building high-level features using large scale unsupervised learning. In ICML.  Q. Le M. Ranzato R. Monga M. Devin K. Chen G. Corrado J. Dean and A. Ng. 2012. Building high-level features using large scale unsupervised learning. In ICML.","DOI":"10.1109\/ICASSP.2013.6639343"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2011.5995523"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2005.20"},{"volume-title":"IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1--8.","author":"Leordeanu M.","key":"e_1_2_1_39_1"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-011-0442-2"},{"volume-title":"IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1--8.","author":"Leordeanu M.","key":"e_1_2_1_41_1"},{"volume-title":"International Conference on Computer Vision and Pattern Recognition (CVPR). 2735--2742","author":"Li C.","key":"e_1_2_1_42_1"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2006.79"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-009-0265-6"},{"volume-title":"International Conference on Computer Vision and Pattern Recognition (CVPR). 3442--3449","author":"Liao Z.","key":"e_1_2_1_45_1"},{"key":"e_1_2_1_46_1","unstructured":"C.-J. Lin and R. C. Weng. 2004. Simple probabilistic predictions for support vector regression. In Technical report Department of Computer Science National Taiwan University Taiwan.  C.-J. Lin and R. C. Weng. 2004. Simple probabilistic predictions for support vector regression. In Technical report Department of Computer Science National Taiwan University Taiwan."},{"volume-title":"IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1609--1616","author":"Liu H.","key":"e_1_2_1_47_1"},{"volume-title":"IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 574--581","author":"Liu H.","key":"e_1_2_1_48_1"},{"volume-title":"IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 917--924","author":"Liu K.","key":"e_1_2_1_49_1"},{"volume-title":"IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1038--1045","author":"Maji S.","key":"e_1_2_1_50_1"},{"key":"e_1_2_1_51_1","unstructured":"Microsoft. 2011. Introducing Kinect for Xbox 360.  Microsoft. 2011. Introducing Kinect for Xbox 360."},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33765-9_10"},{"volume-title":"IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1576--1583","author":"Olsson C.","key":"e_1_2_1_53_1"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33783-3_26"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33712-3_23"},{"volume-title":"IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2759--2766","author":"Ren X.","key":"e_1_2_1_56_1"},{"volume-title":"IEEE International Conference on Computer Vision Workshop (ICCV Workshops). 601--608","author":"Silberman N.","key":"e_1_2_1_57_1"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33715-4_54"},{"volume-title":"11th European Conference on Computer Vision (ECCV). 658--671","author":"Sun M.","key":"e_1_2_1_59_1"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33868-7_10"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.imavis.2009.01.002"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-88688-4_44"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-009-0271-8"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2011.5995430"},{"volume-title":"9th International Conference on Computer Vision (ICCV). 257--264","author":"Wallraven C.","key":"e_1_2_1_65_1"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2013.234"},{"volume-title":"IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 151--158","author":"Wang X.","key":"e_1_2_1_67_1"},{"volume-title":"IEEE International Conference on Robotics and Automation (ICRA). 5384--5391","author":"Wohlkinger W.","key":"e_1_2_1_68_1"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1145\/2072298.2072021"},{"volume-title":"IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3402--3409","author":"Xu Y.","key":"e_1_2_1_70_1"},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2011.2181952"},{"key":"e_1_2_1_72_1","unstructured":"Q. Zhang. 2013. Category Dataset of Kinect RGBD Images.  Q. Zhang. 2013. Category Dataset of Kinect RGBD Images."},{"key":"e_1_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2013.32"},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2013.168"},{"volume-title":"Proceeding of the IEEE International Conference on Robotics and Automation (ICRA).","author":"Zhang Q.","key":"e_1_2_1_75_1"},{"key":"e_1_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.181"},{"key":"e_1_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.95"},{"volume-title":"Proceeding of the IEEE International Conference on Robotics and Automation (ICRA).","author":"Zhang Q.","key":"e_1_2_1_78_1"},{"key":"e_1_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.1145\/1873951.1874127"},{"key":"e_1_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2013.376"},{"volume-title":"International Conference on Computer Vision and Pattern Recognition (CVPR). 3218--3225","author":"Zhu J.-Y.","key":"e_1_2_1_81_1"}],"container-title":["ACM Transactions on Intelligent Systems and Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2629701","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2629701","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T06:13:31Z","timestamp":1750227211000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2629701"}},"subtitle":["Single Labeling for Mining Visual Models"],"short-title":[],"issued":{"date-parts":[[2015,3,31]]},"references-count":81,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2015,5,4]]}},"alternative-id":["10.1145\/2629701"],"URL":"https:\/\/doi.org\/10.1145\/2629701","relation":{},"ISSN":["2157-6904","2157-6912"],"issn-type":[{"type":"print","value":"2157-6904"},{"type":"electronic","value":"2157-6912"}],"subject":[],"published":{"date-parts":[[2015,3,31]]},"assertion":[{"value":"2014-02-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-05-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-03-31","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}