{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T20:54:07Z","timestamp":1760043247720,"version":"3.41.2"},"reference-count":22,"publisher":"Emerald","issue":"6","license":[{"start":{"date-parts":[[2012,10,12]],"date-time":"2012-10-12T00:00:00Z","timestamp":1350000000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.emerald.com\/insight\/site-policies"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2012,10,12]]},"abstract":"<jats:sec><jats:title content-type=\"abstract-heading\">Purpose<\/jats:title><jats:p>The purpose of this paper is to design an interactive industrial robotic system which can be used to assist a \u201clayperson\u201d in re\u2010casting a generic pick\u2010and\u2010place application. A user can program a pick\u2010and\u2010place application simply by pointing to objects in the work area and speaking simple and intuitive natural language commands.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-heading\">Design\/methodology\/approach<\/jats:title><jats:p>The system was implemented in C# using the EMGU wrapper classes for OpenCV as well as the MS Speech Recognition API. The target language to be recognized was modelled using traditional augmented transition networks which were implemented as XML Grammars. The authors developed an original finger\u2010pointing algorithm using a unique combination of standard morphological and image processing techniques. Recognized voice commands trigger the vision component to capture what a user is pointing at. If the specified action requires robot movement, the required information is sent to the robot control component of the system, which then transmits the commands to the robot controller for execution.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-heading\">Findings<\/jats:title><jats:p>The voice portion of the system was tested on the factory floor in a \u201ctypical\u201d manufacturing environment, which was right at the maximum allowable average decibel level specified by OSHA. The findings show that a modern\/standard MS Speech API voice recognition system can achieve a 100 per cent accuracy of simple commands; although at the noisy levels of 89 decibels on average, every one out of six commands had to be repeated. The vision component was test of 72 test subjects who had no prior knowledge of this work. The system accurately recognized what the test subjects were pointing at 95 per cent of the time within five seconds of hand readjusting.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-heading\">Research limitations\/implications<\/jats:title><jats:p>The vision component suffers from the \u201ctypical\u201d problems: very shiny surfaces can cause problems; very poor contrast between the pointing hand and the background; and occlusions. Currently the system can only handle a limited amount of depth recovery using a spring mounted gripper. A second camera (future work) needs to be incorporated in order to handle large depth variations in the work area.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-heading\">Practical implications<\/jats:title><jats:p>This system could have a huge impact on how factory floor workers interact with robotic equipment.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-heading\">Originality\/value<\/jats:title><jats:p>The testing of the voice system on a factory floor, although simple, is very important. It proves the viability of this component of the system and debunks arguments that factories are simply too noisy for current voice technology. The unique finger\u2010pointing algorithm developed by the authors is also an important contribution to the field. In particular, the manner in which the pointing vector was constructed. Furthermore, very few papers report results of non\u2010experts using their pointing algorithms. The paper reports concrete results that show the system is intuitive and user friendly to \u201claypersons\u201d.<\/jats:p><\/jats:sec>","DOI":"10.1108\/01439911211268796","type":"journal-article","created":{"date-parts":[[2013,4,23]],"date-time":"2013-04-23T05:30:27Z","timestamp":1366695027000},"page":"592-600","source":"Crossref","is-referenced-by-count":23,"title":["Pick\u2010and\u2010place application development using voice and visual commands"],"prefix":"10.1108","volume":"39","author":[{"given":"Sebastian","family":"van Delden","sequence":"first","affiliation":[]},{"given":"Michael","family":"Umrysh","sequence":"additional","affiliation":[]},{"given":"Carlos","family":"Rosario","sequence":"additional","affiliation":[]},{"given":"Gregory","family":"Hess","sequence":"additional","affiliation":[]}],"member":"140","reference":[{"key":"key2022020620002355000_b1","doi-asserted-by":"crossref","unstructured":"Bartholomew, J. and Miller, G. (1988), \u201cVoice control for noisy industrial environments\u201d, Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Vol. 3, pp. 1509\u201010.","DOI":"10.1109\/IEMBS.1988.95354"},{"key":"key2022020620002355000_b2","unstructured":"Bradski, G. and Kaehler, A. (2008), Learning OpenCV: Compute Vision with the OpenCV Library, O'Reilly Media, Sebastopol, CA."},{"key":"key2022020620002355000_b3","unstructured":"Chang, C., Chen, J., Tai, W. and Han, C. (2006), \u201cNew approach for static gesture recognition\u201d, Journal of Information Science Engineering, Vol. 22, pp. 1047\u201057."},{"key":"key2022020620002355000_b4","unstructured":"Craig, J. (2004), Introduction to Robotics: Mechanics and Control, 3rd ed., Prentice\u2010Hall, Upper Saddle River, NJ."},{"key":"key2022020620002355000_b5","doi-asserted-by":"crossref","unstructured":"Droeschel, D., Stuckler, J. and Behnke, S. (2011), \u201cLearning to interpret pointing gesture with a time\u2010of\u2010flight camera\u201d, Proceedings of the 6th ACM\/IEEE Conference on Human\u2010Robot Interaction, Lausanne, Switzerland.","DOI":"10.1145\/1957656.1957822"},{"key":"key2022020620002355000_b6","doi-asserted-by":"crossref","unstructured":"Erol, A., Bebis, G., Nicolescu, M., Boyle, R. and Twombly, X. (2007), \u201cVision\u2010based hand pose estimation: a review\u201d, Computer Vision and Image Understanding, Vol. 108 Nos 1\/2, pp. 52\u201073.","DOI":"10.1016\/j.cviu.2006.10.012"},{"key":"key2022020620002355000_b7","unstructured":"Grammar (2011), Grammar Format Tags for the Microsoft Speech API, MSDN Library, available at: http:\/\/msdn2.microsoft.com\/en\u2010us\/library\/ms723634.aspx (accessed 29 October 2011)."},{"key":"key2022020620002355000_b8","unstructured":"Guan, H., Chang, J., Chen, L. and Feris, R. (2006), \u201cMulti\u2010view appearance\u2010based 3D hand pose estimation\u201d, Proceedings of the Conference on Computer Vision and Pattern Recognition, pp. 154\u201060."},{"key":"key2022020620002355000_b9","unstructured":"H\u00e4gele, M., Schaaf, W. and Helms, E. (2002), \u201cRobot assistants at manual workplaces: effective co\u2010operation and safety aspects\u201d, Proceedings of the 33rd International Symposium on Robotics, Stockholm, Sweden."},{"key":"key2022020620002355000_b10","doi-asserted-by":"crossref","unstructured":"Holada, M. and Pelc, M. (2008), \u201cThe robot voice\u2010control system with interactive learning\u201d, in Lazinica, A. (Ed.), New Developments in Robotics Automation and Control, InTech, Rijeka.","DOI":"10.5772\/6284"},{"key":"key2022020620002355000_b11","unstructured":"Hollmann, R. and H\u00e4gele, M. (2008), \u201cThe use of voice control for industrial robots in noisy manufacturing environments\u201d, Proceedings of the 39th International Symposium on Robotics, Seoul, Korea, pp. 14\u201018."},{"key":"key2022020620002355000_b12","unstructured":"Kawarazaki, N., Yoshidome, T. and Nishihara, K. (2004), \u201cAn assistive robot system using gesture and voice instructions\u201d, Proceedings of the 2nd Cambridge Workshop on Universal Access and Assistive Technology (Incorporating the 5th Cambridge Workshop on Rehabilitation Robotics)."},{"key":"key2022020620002355000_b13","unstructured":"Kehl, R. and van Gool, K. (2004), \u201cReal\u2010time pointing gesture recognition for an immersive environment\u201d, Proceedings of Sixth IEEE International Conference on Automatic Face and Gesture Recognition, Seoul, Korea."},{"key":"key2022020620002355000_b14","unstructured":"OSHA (2011), Factsheet: Laboratory Safety Noise. Occupational Safety and Health Administration (OSHA), available at: www.osha.gov\/Publications\/laboratory\/OSHAfactsheet\u2010laboratory\u2010safety\u2010noise.pdf (accessed 29 October 2011).."},{"key":"key2022020620002355000_b15","doi-asserted-by":"crossref","unstructured":"Park, C., Roh, M. and Lee, S. (2008), \u201cReal\u2010time 3D pointing gesture recognition in mobile space\u201d, Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition, Amsterdam, The Netherlands, pp. 1\u20106.","DOI":"10.1109\/AFGR.2008.4813448"},{"key":"key2022020620002355000_b16","doi-asserted-by":"crossref","unstructured":"Pires, J. (2005), \u201cRobot\u2010by\u2010voice: experiments on commanding an industrial robot using the human voice\u201d, Industrial Robot: An International Journal, Vol. 32 No. 6.","DOI":"10.1108\/01439910510629244"},{"key":"key2022020620002355000_b17","doi-asserted-by":"crossref","unstructured":"Quek, F., McNeill, D., Bryll, R., Duncan, S., Ma, X., Kirbas, C., McCullough, K. and Ansari, A. (2002), \u201cMultimodal human discourse: gesture and speech\u201d, ACM Transactions on Computer\u2010Human Interaction, Vol. 9 No. 3, pp. 171\u201093.","DOI":"10.1145\/568513.568514"},{"key":"key2022020620002355000_b18","doi-asserted-by":"crossref","unstructured":"Storring, M., Granum, E. and Moeslund, T. (2001), \u201cA natural interface to a virtual environment through computer vision\u2010estimated pointing gestures\u201d, In the Workshop on Gesture and Sign Language Based Human\u2010Computer Interaction, London, pp. 59\u201063.","DOI":"10.1007\/3-540-47873-6_6"},{"key":"key2022020620002355000_b20","unstructured":"van Delden, S. and Overcash, B. (2008), \u201cTowards voice\u2010guided robotic manipulator jogging\u201d, Proceedings of the 12th World Multiconference on Systemics, Cybernetics and Informatics, Orlando, FL, Vol. 3, pp. 138\u201044."},{"key":"key2022020620002355000_b19","doi-asserted-by":"crossref","unstructured":"van Delden, S. and Umrysh, M. (2011), \u201cVisual detection of objects in a robotic work area using hand gestures\u201d, Proceedings of the 9th IEEE International Symposium on Robotics and Sensor Environments, Montreal, Canada, pp. 237\u201043.","DOI":"10.1109\/ROSE.2011.6058529"},{"key":"key2022020620002355000_b21","doi-asserted-by":"crossref","unstructured":"Woods, W. (1970), \u201cTransition network grammars for natural language analysis\u201d, Communications of the ACM, Vol. 13, pp. 591\u2010602.","DOI":"10.1145\/355598.362773"},{"key":"key2022020620002355000_b22","unstructured":"Yoichi, S., Kobayashi, Y. and Koike, H. (2000), \u201cFast tracking of hands and fingertips in infrared images for augmented desk interface\u201d, Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition, IEEE Computer Society, Washington, DC, pp. 462\u20107."}],"container-title":["Industrial Robot: An International Journal"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/www.emeraldinsight.com\/doi\/full-xml\/10.1108\/01439911211268796","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/01439911211268796\/full\/xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/01439911211268796\/full\/html","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,24]],"date-time":"2025-07-24T23:50:58Z","timestamp":1753401058000},"score":1,"resource":{"primary":{"URL":"http:\/\/www.emerald.com\/ir\/article\/39\/6\/592-600\/179002"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,10,12]]},"references-count":22,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2012,10,12]]}},"alternative-id":["10.1108\/01439911211268796"],"URL":"https:\/\/doi.org\/10.1108\/01439911211268796","relation":{},"ISSN":["0143-991X"],"issn-type":[{"type":"print","value":"0143-991X"}],"subject":[],"published":{"date-parts":[[2012,10,12]]}}}