Spaces:
Runtime error
Runtime error
# Converting SMPL to SMPL-X | |
<!-- The SMPL body model [1] has been used to solve many problems, in particular --> | |
<!-- the estimation of 3D human pose and shape from images [2, 3, 6], videos [4, --> | |
<!-- 5], or even radio signals [9], generation of synthetic data [7], the creation of large --> | |
<!-- motion databases [8] and many other applications. SMPL however is by no means --> | |
<!-- the final step in human body modeling. It lacks finger --> | |
<!-- articulation and an expressive face. Follow-up work has addressed this issues --> | |
<!-- with the introduction of SMPL+H [10] and SMPL-X [11], richer models that --> | |
<!-- address some of the limitations of SMPL. Of course, we don't want to throw away --> | |
<!-- all data collected in SMPL format, which means that we need some tool to --> | |
<!-- convert the parameters of SMPL to SMPL-X. --> | |
The SMPL body model [1] is in wide use in computer vision and graphics for both | |
research and industrial applications. While widely used, SMPL lacks details like | |
articulated hands and an expressive face. The SMPL-X model [3] addresses this | |
and includes both the face and hands. | |
Many legacy applications and datasets are built on SMPL and people want to | |
"upgrade" them to SMPL-X. While SMPL-X is based on the SMPL technology, they are | |
not completely interchangeable. | |
Importantly the shape and pose parameters of SMPL and SMPL-X seem tantalizingly | |
similar. Sadly, you can't just take them from one model and use them with the | |
other. In particular, the joint locations in SMPL-X differ from those in SMPL, | |
meaning that the pose (theta) parameters are not interchangeable. | |
Here we describe a tool to convert back and forth between the models. This | |
involves fitting one model to the other to recover the right parameters. | |
The first step in this process is to establish a mapping between SMPL and | |
SMPL-X, since their topologies differ. For this, we assume we have a SMPL-X | |
template mesh registered to the SMPL template. Now that the two surfaces match, | |
we compute and store the following quantities: | |
* For each SMPL-X vertex find the nearest point on the SMPL mesh and store: | |
* The index $t_i$ of the triangle where the nearest point is located. | |
* Store the barycentric coordinates of the nearest point with respect to | |
the SMPL triangle $\left[a_i, b_i, c_i\right]$. | |
<!-- SMPL-X and SMPL share the same topology up to the neck, therefore the barycentrics of --> | |
SMPL-X and SMPL share the same topology up to the neck, therefore the Barycentric coordinates of | |
these points are a permutation of `[1.0, 0.0, 0.0]`. We also store a mask of | |
valid vertices, to remove points that have no match between the two meshes, | |
such as the eyeballs or the inner mouth. If we color-code the correspondences | |
we end up with the following image, where the left mesh is SMPL and the right | |
one is SMPL-X: | |
 | |
Now that we have established the correspondences between the models, we can fit | |
SMPL-X to the SMPL annotations. | |
1. The first step is to build a mesh with the SMPL-X topology from the posed | |
SMPL annotations. | |
1. If $t_i$ is the index of the corresponding SMPL triangle for the i-th SMPL-X | |
vertex, then let $f_i \in \mathbb{N}^3$ be the 3 indices of the SMPL vertices that | |
form the triangle. | |
2. Let $m_i$ be the binary mask value for the validity of this vertex. | |
2. The i-th vertex is computed using the barycentrics $\left[a_i, b_i, c_i\right]$ as: | |
$v_i^{SMPL-X} = a_i * v_{f_i^0}^{SMPL} + b_i * v_{f_i^1}^{SMPL} + c_i * v_{f_i^2}^{SMPL}$ | |
2. Now that we have a mesh in SMPL-X topology, we need to find the SMPL-X | |
parameters, i.e. pose $\theta$, shape $\beta$, expression $\psi$ and translation $\gamma$, that best explain it. | |
We use an iterative optimization scheme to | |
recover the parameters: | |
1. Optimize over the pose with a 3D edge term. Make sure that we only use | |
the valid edges, i.e. those whose both end points are found on both | |
meshes: | |
$L_1\left(\theta\right) = \sum_{(i, j) \in \mathcal{E}} m_i m_j \left\lVert(v_i - v_j) - (\hat{v}_i - \hat{v}_j) \right\rVert_2^2$ | |
2. Optimize over the translation vector $\gamma$ to align the two models: | |
$L_2\left({\gamma}\right) = \sum_{i} m_i \left\lVert v_i - \hat{v}_i \right\rVert$ | |
3. Optimize over all parameters, to get the tightest possible fit: | |
$L_3\left((\theta, \beta, \psi, \gamma)\right) = \sum_{i} m_i \left\lVert v_i - \hat{v}_i \right\rVert_2^2$ | |
So now, if you have data in SMPL format, you can convert it to SMPL-X. This | |
should allow you to use it for training. | |
For the inverse mapping, from SMPL-X to | |
SMPL, we follow a similar process to generate the correspondences and then optimize | |
over the SMPL parameters that best fit the | |
transferred mesh. Of course, if you choose to do this, you will lose all | |
information about the hands and the face, since SMPL is not able to model this. | |
For SMPL and SMPL+H [2], the process is easier, since they share the same | |
topology. We can therefore skip the first step, since we already know the | |
correspondences, compute a SMPL or SMPL+H mesh and estimate the parameters of | |
the other model. If we wish to transfer SMPL+H annotations, such as the AMASS | |
motion capture data [4], to SMPL-X, then we can use the correspondences of the | |
SMPL to SMPL-X mapping. | |
## Bibliography | |
[1]: Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: A | |
skinned multi-person linear model. ACM Transactions on Graphics (TOG) - Proceedings of ACM SIGGRAPH Asia 34(6), 248:1β248:16 (2015) | |
[2]: Romero, J., Tzionas, D., Black, M.J.: Embodied hands: Modeling and capturing | |
hands and bodies together. ACM Transactions on Graphics (TOG) - Proceedings | |
of ACM SIGGRAPH Asia 36(6), 245:1β245:17 (2017) | |
[3]: Pavlakos, G., Choutas, V., Ghorbani, N., Bolkart, T., Osman, A.A.A., Tzionas, | |
D., Black, M.J.: Expressive body capture: 3D hands, face, and body from a single | |
image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern | |
Recognition (CVPR). pp. 10967β10977 (2019) | |
[4]: Mahmood, N., Ghorbani, N., Troje, N.F., Pons-Moll, G., Black, M.J.: Amass: | |
Archive of motion capture as surface shapes. ICCV (2019) | |