{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,24]],"date-time":"2025-12-24T12:28:59Z","timestamp":1766579339864,"version":"build-2065373602"},"reference-count":40,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2016,12,13]],"date-time":"2016-12-13T00:00:00Z","timestamp":1481587200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004608","name":"Natural Science Foundation of Jiangsu Province","doi-asserted-by":"publisher","award":["BK20130451"],"award-info":[{"award-number":["BK20130451"]}],"id":[{"id":"10.13039\/501100004608","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100018615","name":"University Science Research Project of Jiangsu Province","doi-asserted-by":"publisher","award":["13KJB520025"],"award-info":[{"award-number":["13KJB520025"]}],"id":[{"id":"10.13039\/501100018615","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Open Fund of Jiangsu Province Key Laboratory for Remote Measuring and Control","award":["YCCK201402","YCCK201502"],"award-info":[{"award-number":["YCCK201402","YCCK201502"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Controlling robots by natural language (NL) is increasingly attracting attention for its versatility, convenience and no need of extensive training for users. Grounding is a crucial challenge of this problem to enable robots to understand NL instructions from humans. This paper mainly explores the object grounding problem and concretely studies how to detect target objects by the NL instructions using an RGB-D camera in robotic manipulation applications. In particular, a simple yet robust vision algorithm is applied to segment objects of interest. With the metric information of all segmented objects, the object attributes and relations between objects are further extracted. The NL instructions that incorporate multiple cues for object specifications are parsed into domain-specific annotations. The annotations from NL and extracted information from the RGB-D camera are matched in a computational state estimation framework to search all possible object grounding states. The final grounding is accomplished by selecting the states which have the maximum probabilities. An RGB-D scene dataset associated with different groups of NL instructions based on different cognition levels of the robot are collected. Quantitative evaluations on the dataset illustrate the advantages of the proposed method. The experiments of NL controlled object manipulation and NL-based task programming using a mobile manipulator show its effectiveness and practicability in robotic applications.<\/jats:p>","DOI":"10.3390\/s16122117","type":"journal-article","created":{"date-parts":[[2016,12,13]],"date-time":"2016-12-13T10:15:52Z","timestamp":1481624152000},"page":"2117","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Detecting Target Objects by Natural Language Instructions Using an RGB-D Camera"],"prefix":"10.3390","volume":"16","author":[{"given":"Jiatong","family":"Bao","sequence":"first","affiliation":[{"name":"Department of Hydraulic, Energy and Power Engineering, Yangzhou University, Yangzhou 225127, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yunyi","family":"Jia","sequence":"additional","affiliation":[{"name":"Department of Automotive Engineering, Clemson University, Greenville, SC 29607, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yu","family":"Cheng","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hongru","family":"Tang","sequence":"additional","affiliation":[{"name":"Department of Hydraulic, Energy and Power Engineering, Yangzhou University, Yangzhou 225127, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ning","family":"Xi","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2016,12,13]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"She, L., Cheng, Y., Chai, J.Y., Jia, Y., Yang, S., and Xi, N. (2014, January 25\u201329). Teaching robots new actions through natural language instructions. Proceedings of the IEEE International Symposium on Robot and Human Interactive Communication, Edinburgh, UK.","DOI":"10.1109\/ROMAN.2014.6926362"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Bao, J., Jia, Y., Cheng, Y., Tang, H., and Xi, N. (2015, January 6\u20139). Feedback of robot states for object detection in natural language controlled robotic systems. Proceedings of the IEEE International Conference on Robotics and Biomimetics, Zhuhai, China.","DOI":"10.1109\/ROBIO.2015.7418881"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1007\/978-3-319-00065-7_28","article-title":"Learning to parse natural language commands to a robot control system","volume":"88","author":"Matuszek","year":"2013","journal-title":"Exp. Robot."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Dzifcak, J., Scheutz, M., Baral, C., and Schermerhorn, P. (2009, January 12\u201317). What to do and how to do it: Translating natural language directives into temporal and dynamic logic representation for goal management and action execution. Proceedings of the IEEE International Conference on Robotics and Automation, Kobe, Japan.","DOI":"10.1109\/ROBOT.2009.5152776"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1343","DOI":"10.1163\/156855308X344864","article-title":"Translating structured english to robot controllers","volume":"22","author":"Fainekos","year":"2008","journal-title":"Adv. Robot."},{"key":"ref_6","unstructured":"Chen, D.L., and Mooney, R.J. (2011, January 7\u201311). Learning to interpret natural language navigation instructions from observations. Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Stenmark, M., and Malec, J. (2014, January 24\u201329). Describing constraint-based assembly tasks in unstructured natural language. Proceedings of the IFAC World Congress, Cape Town, South Africa.","DOI":"10.3182\/20140824-6-ZA-1003.02062"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1177\/0278364915602060","article-title":"Tell me Dave: Context-sensitive grounding of natural language to manipulation instructions","volume":"35","author":"Misra","year":"2016","journal-title":"Int. J. Robot. Res."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Forbes, M., Rao, R., Zettlemoyer, L., and Cakmak, M. (2015, January 26\u201330). Robot programming by demonstration with situated spatial language understanding. Proceedings of the IEEE International Conference on Robotics and Automation, Seattle, WA, USA.","DOI":"10.1109\/ICRA.2015.7139462"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"159","DOI":"10.1007\/s11370-008-0016-5","article-title":"Using dialog and human observations to dictate tasks to a learning robot assistant","volume":"1","author":"Rybski","year":"2008","journal-title":"Intell. Service Robot."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Siebert, A., and Schlangen, D. (2008, January 19\u201320). A Simple method for resolution of definite reference in a shared visual context. Proceedings of the SIGdial Workshop on Discourse and Dialogue, Columbus, OH, USA.","DOI":"10.3115\/1622064.1622080"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1284","DOI":"10.1177\/0278364911401765","article-title":"The MOPED framework: Object recognition and pose estimation for manipulation","volume":"30","author":"Collet","year":"2011","journal-title":"Int. J. Robot. Res."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Schwarz, M., Schulz, H., and Behnke, S. (2015, January 26\u201330). RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features. Proceedings of the IEEE International Conference on Robotics and Automation, Seattle, WA, USA.","DOI":"10.1109\/ICRA.2015.7139363"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Guadarrama, S., Riano, L., Golland, D., Gohring, D., Jia, Y., Klein, D., Abbeel, P., and Darrell, T. (2013, January 3\u20137). Grounding spatial relations for human-robot interaction. Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems, Tokyo, Japan.","DOI":"10.1109\/IROS.2013.6696569"},{"key":"ref_15","unstructured":"Sun, Y., Bo, L., and Fox, D. (2013, January 6\u201310). Attribute based object identification. Proceedings of the IEEE International Conference on Robotics and Automation, Karlsruhe, Germany."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Zampogiannis, K., Yang, Y., Ferm\u00fcller, C., and Aloimonos, Y. (2015, January 26\u201330). Learning the spatial semantics of manipulation actions through preposition grounding. Proceedings of the IEEE International Conference on Robotics and Automation, Seattle, WA, USA.","DOI":"10.1109\/ICRA.2015.7139371"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Bjorkman, M., and Kragic, D. (2010, January 3\u20137). Active 3D scene segmentation and detection of unknown objects. Proceedings of the IEEE International Conference on Robotics and Automation, Anchorage, AK, USA.","DOI":"10.1109\/ROBOT.2010.5509973"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"639","DOI":"10.1109\/TPAMI.2011.171","article-title":"Active visual segmentation","volume":"34","author":"Mishra","year":"2012","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_19","unstructured":"Potapova, E., Varadarajan, K.M., Richtsfeld, A., Zillich, M., and Vincze, M. (June, January 31). Attention-driven object detection and segmentation of cluttered table scenes using 2.5D symmetry. Proceedings of the IEEE International Conference on Robotics and Automation, Hong Kong, China."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"21054","DOI":"10.3390\/s150921054","article-title":"Saliency-guided detection of unknown objects in RGB-D indoor scenes","volume":"15","author":"Bao","year":"2015","journal-title":"Sensors"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Johnson-Roberson, M., Bohg, J., Skantze, G., and Gustafson, J. (2011, January 25\u201330). Enhanced visual scene understanding through human-robot dialog. Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems, San Francisco, CA, USA.","DOI":"10.1109\/IROS.2011.6048219"},{"key":"ref_22","unstructured":"Sun, Y., Bo, L., and Fox, D. (June, January 31). Learning to identify new objects. Proceedings of the IEEE International Conference on Robotics and Automation, Hong Kong, China."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1167","DOI":"10.1177\/0278364914537359","article-title":"A framework for learning semantic maps from grounded natural language descriptions","volume":"33","author":"Walter","year":"2014","journal-title":"Int. J. Robot. Res."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Henmachandra, S., Duvallet, F., Howard, T.M., Roy, N., Stentz, A., and Walter, M.R. (2015, January 26\u201330). Learning models for following natural language directions in unknown environments. Proceedings of the IEEE International Conference on Robotics and Automation, Seattle, WA, USA.","DOI":"10.1109\/ICRA.2015.7139984"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Tellex, S., Kollar, T., Dickerson, S., Walter, M.R., Banerjee, A.G., Teller, S.J., and Roy, N. (2011, January 7\u201311). Understanding natural language commands for robotic navigation and mobile manipulation. Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA.","DOI":"10.1609\/aaai.v25i1.7979"},{"key":"ref_26","unstructured":"Howard, T.M., Chung, I., Propp, O., Walter, M.R., and Roy, N. (2014, January 14\u201318). Efficient natural language interfaces for assistive robots. Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems, Chicago, IL, USA."},{"key":"ref_27","unstructured":"Hu, R., Xu, H., Rohrbach, M., Feng, J., Saenko, K., and Darrell, T. (July, January 26). Natural language object retrieval. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"1437","DOI":"10.3390\/s120201437","article-title":"Accuracy and resolution of Kinect depth data for indoor mapping applications","volume":"12","author":"Khoshelham","year":"2012","journal-title":"Sensors"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1007\/s11263-014-0777-6","article-title":"Indoor scene understanding with RGB-D images: Bottom-up segmentation, object detection and semantic segmentation","volume":"112","author":"Gupta","year":"2015","journal-title":"Int. J. Comput. Vision"},{"key":"ref_30","unstructured":"Rodriguez, S., Burrus, N., and Abderrahim, M. (2013, January 21\u201324). 3D object reconstruction with a single RGB-Depth image. Proceedings of the International Conference on Computer Vision Theory and Applications, Barcelona, Spain."},{"key":"ref_31","unstructured":"Bo, L., Ren, X., and Fox, D. (2012, January 18\u201321). Unsupervised feature learning for RGB-D based object recognition. Proceedings of the International Symposium on Experimental Robotics, Quebec, QC, Canada."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Bo, L., Ren, X., and Fox, D. (2013, January 25\u201327). Multipath sparse coding using hierarchical matching pursuit. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.91"},{"key":"ref_33","first-page":"1871","article-title":"LIBLINEAR: A library for large linear classification","volume":"9","author":"Fan","year":"2008","journal-title":"J. Mach. Learn. Res."},{"key":"ref_34","unstructured":"Carnegie Mellon University CMU Sphinx. Available online: http:\/\/cmusphinx.sourceforge.net\/."},{"key":"ref_35","unstructured":"Jia, Y., Xi, N., Chai, J., Cheng, Y., Fang, R., and She, L. (June, January 31). Perceptive feedback for natural language control of robotic operators. Proceedings of the IEEE International Conference on Robotics and Automation, Hong Kong, China."},{"key":"ref_36","unstructured":"Dan, K., and Christopher, D.M. (2003, January 7\u201312). Accurate unlexicalized parsing. Proceedings of the 41st Meeting of the Association for Computational Linguistics, Sapporo, Japan."},{"key":"ref_37","unstructured":"Taylor, A., Marcus, M., and Santorini, B. (2003). Treebanks, Springer."},{"key":"ref_38","unstructured":"Bao, J. Referential Grounding in Robotics. Available online: http:\/\/www.jiatongbao.net\/research\/rg\/."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Jia, Y., She, L., Cheng, Y., Bao, J., Chai, J., and Xi, N. (2016, January 21\u201324). Program robots manufacturing tasks by natural language instructions. Proceedings of the IEEE\/RAS International Conference on Automation Science and Engineering, Fort Worth, TX, USA.","DOI":"10.1109\/COASE.2016.7743461"},{"key":"ref_40","unstructured":"Bao, J. Natural Language Based Robot Programming. Available online: http:\/\/www.jiatongbao.net\/research\/nlrp\/."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/16\/12\/2117\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T19:28:29Z","timestamp":1760210909000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/16\/12\/2117"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,12,13]]},"references-count":40,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2016,12]]}},"alternative-id":["s16122117"],"URL":"https:\/\/doi.org\/10.3390\/s16122117","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2016,12,13]]}}}