{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,20]],"date-time":"2026-05-20T16:45:01Z","timestamp":1779295501023,"version":"3.51.4"},"reference-count":53,"publisher":"SAGE Publications","issue":"6","license":[{"start":{"date-parts":[[2003,6,1]],"date-time":"2003-06-01T00:00:00Z","timestamp":1054425600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["The International Journal of Robotics Research"],"published-print":{"date-parts":[[2003,6]]},"abstract":"<jats:p>We present a method for recovering three-dimensional (3D) human body motion from monocular video sequences based on a robust image matching metric, incorporation of joint limits and non-self-intersection constraints, and a new sample-and-refine search strategy guided by rescaled cost-function covariances. Monocular 3D body tracking is challenging: besides the difficulty of matching an imperfect, highly flexible, self-occluding model to cluttered image features, realistic body models have at least 30 joint parameters subject to highly nonlinear physical constraints, and at least a third of these degrees of freedom are nearly unobservable in any given monocular image. For image matching we use a carefully designed robust cost metric combining robust optical flow, edge energy, and motion boundaries. The nonlinearities and matching ambiguities make the parameter-space cost surface multimodal, ill-conditioned and highly nonlinear, so searching it is difficult. We discuss the limitations of CONDENSATION-like samplers, and describe a novel hybrid search algorithm that combines inflated-covariance-scaled sampling and robust continuous optimization subject to physical constraints and model priors. Our experiments on challenging monocular sequences show that robust cost modeling, joint and self-intersection constraints, and informed sampling are all essential for reliable monocular 3D motion estimation.<\/jats:p>","DOI":"10.1177\/0278364903022006003","type":"journal-article","created":{"date-parts":[[2003,7,1]],"date-time":"2003-07-01T18:06:05Z","timestamp":1057082765000},"page":"371-391","source":"Crossref","is-referenced-by-count":158,"title":["Estimating Articulated Human Motion with Covariance Scaled Sampling"],"prefix":"10.1177","volume":"22","author":[{"given":"Cristian","family":"Sminchisescu","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bill","family":"Triggs","sequence":"additional","affiliation":[{"name":"INRIA Rh\u00f4ne-Alpes, GRAVIR-CNRS, 655 avenue de l'Europe, 38330                        Montbonnot, France"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","published-online":{"date-parts":[[2003,6,1]]},"reference":[{"key":"atypb1","doi-asserted-by":"publisher","DOI":"10.1145\/964965.808573"},{"key":"atypb2","doi-asserted-by":"crossref","unstructured":"Barron, C., and Kakadiaris, I. 2000. Estimating anthropometry and pose from a single image . In IEEE International Conference on Computer Vision and Pattern Recognition, pp. 669-676 .","DOI":"10.1109\/CVPR.2000.855884"},{"key":"atypb3","unstructured":"Black, M. 1992. Robust Incremental Optical Flow. PhD thesis, Yale University."},{"key":"atypb4","doi-asserted-by":"crossref","unstructured":"Black, M., and Anandan, P. 1996. The robust estimation of multiple motions: parametric and piecewise smooth flow fields . Computer Vision and Image Understanding 6(1): 57-92 .","DOI":"10.1006\/cviu.1996.0006"},{"key":"atypb5","unstructured":"Blake, A., North, B., and Isard, M. 1999. Learning multi-class dynamics . Advances in Neural Information Processing Systems 11: 389-395 ."},{"key":"atypb6","doi-asserted-by":"crossref","unstructured":"Brand, M. 1999. Shadow puppetry . In IEEE International Conference on Computer Vision, pp. 1237-1244 .","DOI":"10.1109\/ICCV.1999.790422"},{"key":"atypb7","unstructured":"Bregler, C., and Malik, J. 1998. Tracking people with twists and exponential maps . In IEEE International Conference on Computer Vision and Pattern Recognition."},{"key":"atypb8","unstructured":"Cham, T., and Rehg, J. 1999. A multiple hypothesis approach to figure tracking . In IEEE International Conference on Computer Vision and Pattern Recognition, Vol. 2, pp. 239-245 ."},{"key":"atypb9","unstructured":"Choo, K., and Fleet, D. 2001. People tracking using hybrid Monte Carlo filtering . In IEEE International Conference on Computer Vision."},{"key":"atypb10","doi-asserted-by":"crossref","unstructured":"Delamarre, Q., and Faugeras, O. 1999. 3D articulated models and multi-view tracking with silhouettes . In IEEE International Conference on Computer Vision.","DOI":"10.1109\/ICCV.1999.790292"},{"key":"atypb11","unstructured":"Deutscher, J., Blake, A., and Reid, I. 2000. Articulated body motion capture by annealed particle filtering . In IEEE International Conference on Computer Vision and Pattern Recognition."},{"key":"atypb12","unstructured":"Deutscher, J., Davidson, A., and Reid, I. 2001. Articulated partitioning of high dimensional search spacs associated with articulated body motion capture . In IEEE International Conference on Computer Vision and Pattern Recognition."},{"key":"atypb13","doi-asserted-by":"crossref","unstructured":"Deutscher, J., North, B., Bascle, B., and Blake, A. 1999. Tracking through singularities and discontinuities by random sampling . In IEEE International Conference on Computer Vision, pp. 1144-1149 .","DOI":"10.1109\/ICCV.1999.790409"},{"key":"atypb14","unstructured":"Drummond, T., and Cipolla, R. 2001. Real-time tracking of highly articulated structures in the presence of noisy measurements . In IEEE International Conference on Computer Vision."},{"key":"atypb15","unstructured":"Fletcher, R. 1987. Practical Methods of Optimization, Wiley, New York ."},{"key":"atypb16","doi-asserted-by":"crossref","unstructured":"Gavrila, D., and Davis, L. 1996. 3D model based tracking of humans in action: a multiview approach . In IEEE International Conference on Computer Vision and Pattern Recognition, pp. 73-80 .","DOI":"10.1109\/CVPR.1996.517056"},{"key":"atypb17","doi-asserted-by":"crossref","unstructured":"Gonglaves, L., Bernardo, E., Ursella, E., and Perona, P. 1995. Monocular tracking of the human arm in 3D . In IEEE International Conference on Computer Vision, pp. 764-770 .","DOI":"10.1109\/ICCV.1995.466861"},{"key":"atypb18","doi-asserted-by":"crossref","unstructured":"Gordon, N., and Salmond, D. 1995. Bayesian state estimation for tracking and guidance using the bootstrap filter . Journal of Guidance, Control and Dynamics.","DOI":"10.2514\/3.21565"},{"key":"atypb19","doi-asserted-by":"crossref","unstructured":"Gordon, N., Salmond, D., and Smith, A. 1993. Novel approach to non-linear\/non-Gaussian state estimation . IEE Proceedings F.","DOI":"10.1049\/ip-f-2.1993.0015"},{"key":"atypb20","unstructured":"Hanim-Humanoid Animation Working Group. 2002. Specifications for a Standard Humanoid. http:\/\/www.hanim.org\/Specifications\/H-Anim1.1\/."},{"key":"atypb21","unstructured":"Heap, T., and Hogg, D. 1998. Wormholes in shape space: tracking through discontinuities changes in shape . In IEEE International Conference on Computer Vision, pp. 334-349 ."},{"key":"atypb22","unstructured":"Howe, N., Leventon, M., and Freeman, W. 1999. Bayesian reconstruction of 3D human motion from single-camera video. Neural Information Processing Systems."},{"key":"atypb23","unstructured":"Isard, M., and Blake, A. 1998. CONDENSATION\u2014Conditional density propagation for visual tracking . International Journal of Computer Vision."},{"key":"atypb24","doi-asserted-by":"crossref","unstructured":"Ju, S., Black, M., and Yacoob, Y. October 1996. Cardboard people: a parameterized model of articulated motion . In 2nd Int. Conf. on Automatic Face and Gesture Recognition, pp. 38-44 .","DOI":"10.1109\/AFGR.1996.557241"},{"key":"atypb25","doi-asserted-by":"crossref","unstructured":"Kakadiaris, I., and Metaxas, D. 1996. Model-based estimation of 3D human motion with occlusion prediction based on active multi-viewpoint selection . In IEEE International Conference on Computer Vision and Pattern Recognition, pp. 81-87 .","DOI":"10.1109\/CVPR.1996.517057"},{"key":"atypb26","doi-asserted-by":"publisher","DOI":"10.1016\/0734-189X(85)90094-5"},{"key":"atypb27","doi-asserted-by":"crossref","unstructured":"Liu, J. 1996. Metropolized independent sampling with comparisons to rejection sampling and importance sampling . Statistics and Computing, 6.","DOI":"10.1007\/BF00162521"},{"key":"atypb28","unstructured":"MacCormick, J., and Blake, A. 1998. A probabilistic contour discriminant for object localisation . In IEEE International Conference on Computer Vision."},{"key":"atypb29","doi-asserted-by":"crossref","unstructured":"MacCormick, J., and Isard, M. 2000. Partitioned sampling, articulated objects, and interface-quality hand tracker . In European Conference on Computer Vision, Vol. 2, pp. 3-19 .","DOI":"10.1007\/3-540-45053-X_1"},{"key":"atypb30","unstructured":"Merwe, R., Doucet, A., Freitas, N., and Wan, E. May 2000. The unscented particle filter. Technical Report CUED\/FINFENG\/TR 380, Cambridge University, Department of Engineering."},{"key":"atypb31","unstructured":"Papageorgiu, C., and Poggio, T. 1999. Trainable pedestrian detection . In International Conference on Image Processing."},{"key":"atypb32","unstructured":"Pitt, M., and Shephard, N. 1997. Filtering via simulation: auxiliary particle filter . Journal of the American Statistical Association."},{"key":"atypb33","doi-asserted-by":"crossref","unstructured":"Plankers, R., and Fua, P. 2001. Articulated soft objects for video-based body modeling . In IEEE International Conference on Computer Vision, pp. 394-401 .","DOI":"10.1109\/ICCV.2001.937545"},{"key":"atypb34","doi-asserted-by":"crossref","unstructured":"Rehg, J., and Kanade, T. 1995. Model-based tracking of self occluding articulated objects . In IEEE International Conference on Computer Vision, pp. 612-617 .","DOI":"10.1109\/ICCV.1995.466882"},{"key":"atypb35","doi-asserted-by":"publisher","DOI":"10.1006\/cviu.1994.1006"},{"key":"atypb36","doi-asserted-by":"crossref","unstructured":"Rosales, R., and Sclaroff, S. 2000. Inferring body pose without tracking body parts . In IEEE International Conference on Computer Vision and Pattern Recognition, pp. 721-727 .","DOI":"10.1109\/CVPR.2000.854946"},{"key":"atypb37","unstructured":"Sidenbladh, H., and Black, M. 2001. Learning image statistics for Bayesian tracking . In IEEE International Conference on Computer Vision."},{"key":"atypb38","doi-asserted-by":"crossref","unstructured":"Sidenbladh, H., Black, M., and Fleet, D. 2000. Stochastic tracking of 3D human figures using 2D image motion . In European Conference on Computer Vision.","DOI":"10.1007\/3-540-45053-X_45"},{"key":"atypb39","doi-asserted-by":"crossref","unstructured":"Sidenbladh, H., Black, M., and Sigal, L. 2002. Implicit probabilistic models of human motion for synthesis and tracking . In European Conference on Computer Vision.","DOI":"10.1007\/3-540-47969-4_52"},{"key":"atypb40","doi-asserted-by":"crossref","unstructured":"Sminchisescu, C. 2002a. Consistency and coupling in human model likelihoods . In IEEE International Conference on Automatic Face and Gesture Recognition, Washington DC, pp. 27-32 .","DOI":"10.1109\/AFGR.2002.1004125"},{"key":"atypb41","unstructured":"Sminchisescu, C. July 2002b. Estimation algorithms for ambiguous visual models\u2014three-dimensional human modeling and motion reconstruction in monocular video sequences. PhD thesis, Institute National Politechnique de Grenoble (INRIA)."},{"key":"atypb42","doi-asserted-by":"crossref","unstructured":"Sminchisescu, C., Metaxas, D., and Dickinson, S. 2001. Improving the scope of deformable model shape and motion estimation . In IEEE International Conference on Computer Vision and Pattern Recognition, Hawaii, Vol. 1, pp. 485-492 .","DOI":"10.1109\/CVPR.2001.990514"},{"key":"atypb43","unstructured":"Sminchisescu, C., and Telea, A. 2002. Human pose estimation from silhouettes. A consistent approach using distance level sets . In WSCG International Conference for Computer Graphics, Visualization and Computer Vision, Czech Republic."},{"key":"atypb44","unstructured":"Sminchisescu, C., and Triggs, B. 2001a. A robust multiple hypothesis approach to monocular human motion tracking. Technical Report RR-4208, INRIA."},{"key":"atypb45","doi-asserted-by":"crossref","unstructured":"Sminchisescu, C., and Triggs, B. 2001b. Covariance-scaled sampling for monocular 3D body tracking . In IEEE International Conference on Computer Vision and Pattern Recognition, Hawaii, Vol. 1, pp. 447-454 .","DOI":"10.1109\/CVPR.2001.990509"},{"key":"atypb46","doi-asserted-by":"crossref","unstructured":"Sminchisescu, C., and Triggs, B. 2002a. Building roadmaps of local minima of visual models . In European Conference on Computer Vision, Copenhagen, Vol. 1, pp. 566-582 .","DOI":"10.1007\/3-540-47969-4_38"},{"key":"atypb47","doi-asserted-by":"crossref","unstructured":"Sminchisescu, C., and Triggs, B. 2002b. Hyperdynamics importance sampling . In European Conference on Computer Vision, Copenhagen, Vol. 1, pp. 769-783 .","DOI":"10.1007\/3-540-47969-4_51"},{"key":"atypb48","unstructured":"Sminchisescu, C., and Triggs, B. 2003. Kinematic jump processes for monocular 3D human tracking . In IEEE International Conference on Computer Vision and Pattern Recognition."},{"key":"atypb49","doi-asserted-by":"crossref","unstructured":"Taylor, C.J. 2000. Reconstruction of articulated objects from point correspondences in a single uncalibrated image . In IEEE International Conference on Computer Vision and Pattern Recognition, pp. 677-684 .","DOI":"10.1109\/CVPR.2000.855885"},{"key":"atypb50","doi-asserted-by":"crossref","unstructured":"Triggs, B., McLauchlan, P., Hartley, R., and Fitzgibbon, A. 2000. Bundle adjustment\u2014A modern synthesis. In Vision Algorithms: Theory and Practice, Springer-Verlag, Berlin .","DOI":"10.1007\/3-540-44480-7_21"},{"key":"atypb51","doi-asserted-by":"crossref","unstructured":"Vanderbilt, D., and Louie, S.G. 1984. A Monte Carlo simulated annealing approach over continuous variables . Journal of Computational Physics 56.","DOI":"10.1016\/0021-9991(84)90095-0"},{"key":"atypb52","doi-asserted-by":"publisher","DOI":"10.1006\/cviu.1999.0758"},{"key":"atypb53","unstructured":"Zhu, S. C., and Mumford, D. 1997. Learning generic prior models for visual computation . IEEE Transactions on Pattern Analysis and Machine Intelligence 19(11)."}],"container-title":["The International Journal of Robotics Research"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0278364903022006003","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0278364903022006003","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T10:16:57Z","timestamp":1777457817000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/0278364903022006003"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2003,6]]},"references-count":53,"aliases":["10.1177\/027836403128965132"],"journal-issue":{"issue":"6","published-print":{"date-parts":[[2003,6]]}},"alternative-id":["10.1177\/0278364903022006003"],"URL":"https:\/\/doi.org\/10.1177\/0278364903022006003","relation":{},"ISSN":["0278-3649","1741-3176"],"issn-type":[{"value":"0278-3649","type":"print"},{"value":"1741-3176","type":"electronic"}],"subject":[],"published":{"date-parts":[[2003,6]]}}}