RoboSplat is framework that leverages 3D Gaussian Splatting (3DGS) to generate novel demonstrations for RGB-based policy learning.

Starting from a single expert demonstration and multi-view images, our method generates diverse and visually realistic data for policy learning, enabling robust performance across six types of generalization (object poses, object types, camera views, scene appearance, lighting conditions, and embodiments) in the real world.

Compared to previous 2D data augmentation methods, our approach achieves significantly better results across various generalization types. Notably, we achieve this within a unified framework.

RoboSplat Demonstration

Method

We start from a single manually collected demonstration and multi-view images that capture the whole scene. The former provides task-related keyframes, while the latter helps scene reconstruction. After aligning the reconstructed frame with the real-world frame and segmenting different scene components, we carry out autonomous editing of the scene in pursuit of six different types of generalization.

Augmented Demonstrations


Real World Deployment


Five Tasks

Pick Object

Close Drawer

Pick-Place-Close

Dual-Pick-Place

Sweep

Generalization


Our Team

1Shanghai AI Laboratory, 2The Chinese University of Hong Kong, 3Shanghai Jiao Tong University
* Equal contribution

If you have any questions, please contact Sizhe Yang and Wenye Yu.