Photorealistic Monocular 3D Reconstruction of Humans Wearing Clothing
CVPR 2022

Google Research

Given a single image, we reconstruct the full 3D geometry – including self-occluded (or unseen) regions – of the photographed person, together with albedo and shaded surface color. Our end-to-end trainable pipeline requires no image matting and reconstructs all outputs in a single step.


We present PHORHUM, a novel, end-to-end trainable, deep neural network methodology for photorealistic 3D human reconstruction given just a monocular RGB image. Our pixel-aligned method estimates detailed 3D geometry and, for the first time, the unshaded surface color together with the scene illumination. Observing that 3D supervision alone is not sufficient for high fidelity color reconstruction, we introduce patch-based rendering losses that enable reliable color reconstruction on visible parts of the human, and detailed and plausible color estimation for the non-visible parts. Moreover, our method specifically addresses methodological and practical limitations of prior work in terms of representing geometry, albedo, and illumination effects, in an end-to-end model where factors can be effectively disentangled. In extensive experiments, we demonstrate the versatility and robustness of our approach. Our state-of-the-art results validate the method qualitatively and for different metrics, for both geometric and color reconstruction.


Overview of our method. The feature extractor network \(G\) produces pixel-aligned features \(\boldsymbol{z}_{\boldsymbol{x}}\) from an input image \(\mathbf{I}\) for all points in space \(\boldsymbol{x}\). The implicit signed distance function network \(f\) computes the distance \(d\) to the closest surface given a point and its feature. Additionally \(f\) returns albedo colors \(\boldsymbol{a}\) defined for surface points. The shading network \(s\) predicts the shading for surface points given its surface normal \(\boldsymbol{n}_{\boldsymbol{x}}\), as well as illumination \(\boldsymbol{l}\). On the right we show the reconstruction of geometry and albedo colors, and the shaded 3D geometry.



3D Viewer


  title	  = {Photorealistic Monocular 3D Reconstruction of Humans Wearing Clothing},
  author  = {Thiemo Alldieck and Mihai Zanfir and Cristian Sminchisescu},
  year	  = {2022},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}