SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting

CVPR 2024

arXiv Paper

Zhijing Shao1,2, Zhaolong Wang, Zhuang Li, Duotun Wang1, Xiangru Lin2, Yu Zhang, Mingming Fan1,3, Zeyu Wang1,3*

1The Hong Kong University of Science and Technology (Guangzhou)
2Prometheus Vision Technology Co., Ltd.
3The Hong Kong University of Science and Technology
* Corresponding author @ Creative Intelligence and Synergy Lab     

Abstract


We present SplattingAvatar, a hybrid 3D representation of photorealistic human avatars
with Gaussian Splatting embedded on a triangle mesh.


We present SplattingAvatar, a hybrid 3D representation of photorealistic human avatars with Gaussian Splatting embedded on a triangle mesh, which renders over 300 FPS on a modern GPU and 30 FPS on a mobile device. We disentangle the motion and appearance of a virtual human with explicit mesh geometry and implicit appearance modeling with Gaussian Splatting. The Gaussians are defined by barycentric coordinates and displacement on a triangle mesh as Phong surfaces. We extend lifted optimization to simultaneously optimize the parameters of the Gaussians while walking on the triangle mesh. SplattingAvatar is a hybrid representation of virtual humans where the mesh represents low-frequency motion and surface deformation, while the Gaussians take over the high-frequency geometry and detailed appearance. Unlike existing deformation methods that rely on an MLP-based linear blend skinning (LBS) field for motion, we control the rotation and translation of the Gaussians directly by mesh, which empowers its compatibility with various animation techniques, e.g., skeletal animation, blend shapes, and mesh editing. Trainable from monocular videos for both full-body and head avatars, SplattingAvatar shows state-of-the-art rendering quality across multiple datasets.



Method


 

The pipeline of our method. SplattingAvatar learns 3D Gaussians with trainable embedding on the canonical mesh. The motion and deformation of the mesh explicitly bring the Gaussians to the posed space for differentiable rasterization. Both the Gaussians and embedding parameters are optimized during training. The position µ is the barycentric point P plus a displacement d along the interpolated normal vector n. Pose-dependent quaternion and scaling (δq, δs) and pose-invariant quaternion, scaling, opacity, and color (q, s, o, c) together define the properties of the Gaussians

 



Demo Video


If the video does not play, please click here to watch it.


Running in Unity




Citation



  @inproceedings{SplattingAvatar:CVPR2024,
    title = {{SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting}},
    author = {Shao, Zhijing and Wang, Zhaolong and Li, Zhuang and Wang, Duotun and Lin, Xiangru and Zhang, Yu and Fan, Mingming and Wang, Zeyu},
    booktitle = {Computer Vision and Pattern Recognition (CVPR)},
    year = {2024}
  }