{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,11]],"date-time":"2026-06-11T16:21:19Z","timestamp":1781194879436,"version":"3.54.1"},"reference-count":35,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2014,9,23]],"date-time":"2014-09-23T00:00:00Z","timestamp":1411430400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2014,9,23]]},"abstract":"<jats:p>We present a novel method for real-time continuous pose recovery of markerless complex articulable objects from a single depth image. Our method consists of the following stages: a randomized decision forest classifier for image segmentation, a robust method for labeled dataset generation, a convolutional network for dense feature extraction, and finally an inverse kinematics stage for stable real-time pose recovery. As one possible application of this pipeline, we show state-of-the-art results for real-time puppeteering of a skinned hand-model.<\/jats:p>","DOI":"10.1145\/2629500","type":"journal-article","created":{"date-parts":[[2014,10,1]],"date-time":"2014-10-01T13:34:59Z","timestamp":1412170499000},"page":"1-10","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":643,"title":["Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks"],"prefix":"10.1145","volume":"33","author":[{"given":"Jonathan","family":"Tompson","sequence":"first","affiliation":[{"name":"New York University, New York, NY"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Murphy","family":"Stein","sequence":"additional","affiliation":[{"name":"New York University, New York, NY"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yann","family":"Lecun","sequence":"additional","affiliation":[{"name":"New York University, New York, NY"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ken","family":"Perlin","sequence":"additional","affiliation":[{"name":"New York University, New York, NY"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2014,9,23]]},"reference":[{"key":"e_1_2_2_1_1","unstructured":"3GEAR. 2014. 3gear sytems hand-tracking development platform. http:\/\/www.threegear.com\/.  3GEAR. 2014. 3gear sytems hand-tracking development platform. http:\/\/www.threegear.com\/."},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/882262.882311"},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33783-3_46"},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.969114"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2207676.2208335"},{"key":"e_1_2_2_6_1","unstructured":"R. Collobert K. Kavukcuoglu and C. Farabet. 2011. Torch7: A matlab-like environment for machine learning. http:\/\/ronan.collobert.com\/pub\/matos\/2011_torch7_nipsw.pdf.  R. Collobert K. Kavukcuoglu and C. Farabet. 2011. Torch7: A matlab-like environment for machine learning. http:\/\/ronan.collobert.com\/pub\/matos\/2011_torch7_nipsw.pdf."},{"key":"e_1_2_2_7_1","volume-title":"Proceedings of the International Conference on Learning Representations.","author":"Couprie C.","unstructured":"C. Couprie , C. Farabet , L. Najman , and Y. Lecun . 2013. Indoor semantic segmentation using depth information . In Proceedings of the International Conference on Learning Representations. C. Couprie, C. Farabet, L. Najman, and Y. Lecun. 2013. Indoor semantic segmentation using depth information. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2006.10.012"},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2012.231"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1364\/JOSAA.4.000629"},{"key":"e_1_2_2_11_1","doi-asserted-by":"crossref","unstructured":"K. Jarrett K. Kavukcuoglu M. Ranzato and Y. Lecun. 2009. What is the best multi-stage architecture for object recognition&quest; In Proceedings of the 12th IEEE International Conference on Computer Vision. 2146--2153.  K. Jarrett K. Kavukcuoglu M. Ranzato and Y. Lecun. 2009. What is the best multi-stage architecture for object recognition&quest; In Proceedings of the 12 th IEEE International Conference on Computer Vision. 2146--2153.","DOI":"10.1109\/ICCV.2009.5459469"},{"key":"e_1_2_2_12_1","article-title":"Human body part estimation from depth images via spatially-constrained deep learning. Pattern","author":"Jiu M.","year":"2013","unstructured":"M. Jiu , C. Wolf , G. W. Taylor , and A. Baskurt . 2013 . Human body part estimation from depth images via spatially-constrained deep learning. Pattern Recogn. Lett. (to appear). M. Jiu, C. Wolf, G. W. Taylor, and A. Baskurt. 2013. Human body part estimation from depth images via spatially-constrained deep learning. Pattern Recogn. Lett. (to appear).","journal-title":"Recogn. Lett. (to appear)."},{"key":"e_1_2_2_13_1","volume-title":"Proceedings of the IEEE International Computer Vision Workshops. 1228--1234","author":"Keskin C.","unstructured":"C. Keskin , F. Kirac , Y. Kara , and L. Akarun . 2011. Real time hand pose estimation using depth sensors . In Proceedings of the IEEE International Computer Vision Workshops. 1228--1234 . C. Keskin, F. Kirac, Y. Kara, and L. Akarun. 2011. Real time hand pose estimation using depth sensors. In Proceedings of the IEEE International Computer Vision Workshops. 1228--1234."},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33783-3_61"},{"key":"e_1_2_2_15_1","volume-title":"Proceedings of the Neural Information Processing Systems Conference. P. Bartlett, F. Pereira, C. Burges, L. Bottou, and K. Weinberger, Eds., 1106--1114","author":"Krizhevsky A.","unstructured":"A. Krizhevsky , I. Sutskever , and G. Hinton . 2012. Imagenet classification with deep convolutional neural networks . In Proceedings of the Neural Information Processing Systems Conference. P. Bartlett, F. Pereira, C. Burges, L. Bottou, and K. Weinberger, Eds., 1106--1114 . A. Krizhevsky, I. Sutskever, and G. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the Neural Information Processing Systems Conference. P. Bartlett, F. Pereira, C. Burges, L. Bottou, and K. Weinberger, Eds., 1106--1114."},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"key":"e_1_2_2_17_1","volume-title":"Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.","volume":"2","author":"Lecun Y.","unstructured":"Y. Lecun , F. J. Huang , and L. Bottou . 2004. Learning methods for generic object recognition with invariance to pose and lighting . In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Vol. 2 . 97--104. Y. Lecun, F. J. Huang, and L. Bottou. 2004. Learning methods for generic object recognition with invariance to pose and lighting. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Vol. 2. 97--104."},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.5555\/1731309.1731326"},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2461912.2462019"},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2448196.2448232"},{"key":"e_1_2_2_21_1","volume-title":"Proceedings of the IEEE International Conference on Signal and Image Processing Applications. 342--347","author":"Nagi J.","unstructured":"J. Nagi , F. Ducatelle , G. Di Caro , D. Ciresan , U. Meier , A. Giusti , F. Nagi , J. Schmidhuber , and L. Gambardella . 2011. Max-pooling convolutional neural networks for vision-based hand gesture recognition . In Proceedings of the IEEE International Conference on Signal and Image Processing Applications. 342--347 . J. Nagi, F. Ducatelle, G. Di Caro, D. Ciresan, U. Meier, A. Giusti, F. Nagi, J. Schmidhuber, and L. Gambardella. 2011. Max-pooling convolutional neural networks for vision-based hand gesture recognition. In Proceedings of the IEEE International Conference on Signal and Image Processing Applications. 342--347."},{"key":"e_1_2_2_22_1","volume-title":"Proceedings of the Neural Information Processing Systems Conference. 901--908","author":"Nowlan S. J.","unstructured":"S. J. Nowlan and J. C. Platt . 1995. A convolutional neural network hand tracker . In Proceedings of the Neural Information Processing Systems Conference. 901--908 . S. J. Nowlan and J. C. Platt. 1995. A convolutional neural network hand tracker. In Proceedings of the Neural Information Processing Systems Conference. 901--908."},{"key":"e_1_2_2_23_1","volume-title":"Proceedings of the British Machine Vision Conference.","author":"Oikonomidis I.","unstructured":"I. Oikonomidis , N. Kyriazis , and A. Argyros . 2011. Efficient model-based 3D tracking of hand articulations using kinect . In Proceedings of the British Machine Vision Conference. I. Oikonomidis, N. Kyriazis, and A. Argyros. 2011. Efficient model-based 3D tracking of hand articulations using kinect. In Proceedings of the British Machine Vision Conference."},{"key":"e_1_2_2_24_1","volume-title":"Proceedings of the Neural Information Processing Systems Conference. 1017--1024","author":"Osadchy M.","unstructured":"M. Osadchy , Y. Lecun , M. L. Miller , and P. Perona . 2005. Synergistic face detection and pose estimation with energy-based model . In Proceedings of the Neural Information Processing Systems Conference. 1017--1024 . M. Osadchy, Y. Lecun, M. L. Miller, and P. Perona. 2005. Synergistic face detection and pose estimation with energy-based model. In Proceedings of the Neural Information Processing Systems Conference. 1017--1024."},{"key":"e_1_2_2_25_1","volume-title":"Proceedings of the 3rd European Conference on Computer Vision. 35--46","author":"Rehg J. M.","unstructured":"J. M. Rehg and T. Kanade . 1994. Visual tracking of high dof articulated structures: An application to human hand tracking . In Proceedings of the 3rd European Conference on Computer Vision. 35--46 . J. M. Rehg and T. Kanade. 1994. Visual tracking of high dof articulated structures: An application to human hand tracking. In Proceedings of the 3rd European Conference on Computer Vision. 35--46."},{"key":"e_1_2_2_26_1","volume-title":"Libhand: A library for hand articulation","author":"Saric M.","year":"2011","unstructured":"M. Saric . 2011 . Libhand: A library for hand articulation . http:\/\/www.libhand. org\/. M. Saric. 2011. Libhand: A library for hand articulation. http:\/\/www.libhand. org\/."},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2011.5995316"},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2341836.2341902"},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2011.5995538"},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1137\/S1052623495282857"},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2047196.2047269"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/1531326.1531369"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/1599470.1599472"},{"key":"e_1_2_2_34_1","volume-title":"Proceedings of the World Automation Congress. 1--6.","author":"Yasuda T.","unstructured":"T. Yasuda , K. Ohkura , and Y. Matsumura . 2010. Extended PSO with partial randomization for large scale multimodal problems . In Proceedings of the World Automation Congress. 1--6. T. Yasuda, K. Ohkura, and Y. Matsumura. 2010. Extended PSO with partial randomization for large scale multimodal problems. In Proceedings of the World Automation Congress. 1--6."},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.5555\/2421731.2421738"}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2629500","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2629500","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T07:01:18Z","timestamp":1750230078000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2629500"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,9,23]]},"references-count":35,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2014,9,23]]}},"alternative-id":["10.1145\/2629500"],"URL":"https:\/\/doi.org\/10.1145\/2629500","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"value":"0730-0301","type":"print"},{"value":"1557-7368","type":"electronic"}],"subject":[],"published":{"date-parts":[[2014,9,23]]},"assertion":[{"value":"2013-08-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-03-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-09-23","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}