3D body pose estimation from a single image focus on recovering 3D human body pose from a single RGB image. Current works are mostly data-driven as shown in Figure 1. Data-driven methods, however, require large amount of data annotation and cannot generalize well to new environments.
To overcome these limitations, we propose a hybrid approach as shown in Figure 2, where we combine data with prior knowledge extracted from different sources to achieve efficient, robust, and generalizable 3D body pose estimation.
Our current proposed model
An overview of our proposed model is shown below,
The proposed model takes as input a human body image and output 3D body deformable model and camera parameters.
Body representation
We use SMPL [2], a deformable mesh model with 6890 vertices, to represent 3D human body as shown in Figure 5a. SMPL has 69 pose parameters to describe the 23 relative body joints rotation plus one global rotation, and 10 shape parameters to characterize the variation of body height, body proportion, and weight, respectively.
The 3D position of body joints is linear combination of the vertices position. Meanwhile, the dense mesh model allows us to build the correspondences between image and body surface landmarks as shown in Figure 5b.
Describe the body surface landmarks with UV coordinate. Annotate the UV coordinate of the visible body image pixels. Like body joints, image position of body surface landmarks can be predicted through predicting UV coordinates of the visible image pixels. The mapping is shown above.
Generic body pose constraints
Besides using smpl parameters as direct supervision, we can further train the deep learning model by minimizing the reprojection error of 2D body joints and dense body surface landmarks, which has rich and diversify annotations. Meanwhile, we leverage the generic human body constraints to regularize the training and avoid unrealistic estimation.
The summarized constraints are listed below
Anatomy constraint
Human body is symmetric and have similar body proportion
Biomechanics constraint
Human body joints have different Degree of Freedoms (DOFs), and different joint angle limits as shown below.
Physical constraint
Different body parts can not penetrate into each other. Example penetration is shown above.
Geometric constraint
Human torso can be considered as an rigid part, where the joints form special geometric structure, e.g. collinear and coplanar. Unique 3D solution can be calculated given the projection position, body parts length, and camera information under full perspective projection. Moreover, the ratio of projected and real bone length, projected body parts area and real body parts surface area embody the relative depth, view point information. An illustration is shown below.
Constraints formulation
During the training, the summarized constraints as formulated
The penetration loss is realized by first detecting triangle colliding, and then penenalizing the colliding triangle pairs.
Our previous work
We proposed a method to solve upper body pose estimation and tracking by exploiting generic body pose constraints [1]. We use a Bayesian Network (BN) to represent the body pose as shown in Figure 3.
The joints' degree of freedems, and the dependency between different joint are directly captured by number of nodes and the link between them. For joint angle limits and non-penetration constraints, the constraints that can not be directly imposed in the structure, we embed the constraints by learning from generated pseudo-data.
Demo
Baseline
A SOTA model from [3],
Demo
Baseline
A real-time demo from our proprosed method,
References
[1] Data-Free Prior Model for Upper Body Pose Estimation and Tracking
[2] SMPL: A Skinned Multi-Person Linear Model
[3] OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
[4] Coherent Reconstruction of Multiple Humans from a Single Image