EmbodiSteer

Steering Embodiment-Agnostic Visuomotor Policies with Joint-Space Guidance for Zero-Shot Cross-Embodiment Deployment

Shihefeng Wang*, Kangchen Lv*, Mingrui Yu, Xiang Li

Department of Automation, Tsinghua University  |  Beijing Key Laboratory of Embodied Intelligence Systems  |  Institute for Embodied Intelligence and Robotics, Tsinghua University

*Equal contribution. Corresponding authors.

Paper arXiv BibTeX Code Coming Soon

Abstract

Scalable robot imitation learning relies on large-scale heterogeneous data from diverse robots or body-free data, making Cartesian end-effector actions a key interface for embodiment-agnostic policy learning. However, end-effector-only abstraction leaves Cartesian policies unaware of the deployed robot body, making them brittle under robot-specific constraints such as whole-body collision avoidance. To overcome this limitation, we present EmbodiSteer, a training-free framework that steers embodiment-agnostic visuomotor policies toward zero-shot, embodiment-aware deployment. EmbodiSteer keeps policy learning in Cartesian space while efficiently lifting inference-time diffusion sampling into the target robot's joint space via forward kinematics and Jacobian-based updates. With whole-body collision-aware guidance over joint trajectories after each denoising step, the arm can be steered away from collisions while preserving learned end-effector behavior. Compared with Cartesian-only execution, EmbodiSteer reduces collision rate by 46.1% and improves task success rate by 28.5% across 9 simulated robots, and further achieves 90.0% collision rate reduction and 36.7% success rate increase on two physical robots in highly constrained scenarios.

+28.5% simulation success rate
-46.1% simulation collision rate
+36.7% real-world success rate
-90.0% real-world collision rate
EmbodiSteer teaser showing cross-embodiment deployment with obstacle-aware guidance

Real-World Results

Same UMI-trained checkpoint, zero-shot deployment on two robot embodiments

Compare Base Cartesian Policy and EmbodiSteer rollouts on UR5 and Franka Panda. Each task shares one checkpoint across the two robots.

All rollouts are shown at 2x speed.

Make Iced Coffee

Obstacle layout
UR5 Base Cartesian Policy
UR5 EmbodiSteer
Franka Panda Base Cartesian Policy
Franka Panda EmbodiSteer

Put Flower in Vase

Obstacle layout
UR5 Base Cartesian Policy
UR5 EmbodiSteer
Franka Panda Base Cartesian Policy
Franka Panda EmbodiSteer

Arrange Banana

Obstacle layout
UR5 Base Cartesian Policy
UR5 EmbodiSteer
Franka Panda Base Cartesian Policy
Franka Panda EmbodiSteer

Simulation Results

Evaluation across 9 robot embodiments

Simulation tasks and robot embodiments
Quantitative results in simulation across 9 robot embodiments

TSR denotes task success rate, and COR denotes collision rate.

Obstacles Method PlaceToast TurnOnFaucet MakeCoffee Average
TSR ↑RWD ↑COR ↓ TSR ↑RWD ↑COR ↓ TSR ↑RWD ↑COR ↓ TSR ↑RWD ↑COR ↓
w/o Obs.EE 96.8.980-- 83.6.937-- 89.7.932-- 90.0.950--
Joint 94.4.965-- 87.9.957-- 89.7.929-- 90.7.950--
w/ Obs.EE 43.4.63452.6 41.6.75261.8 22.2.45658.3 35.7.61457.6
EE w/ Sampling 47.1.66151.3 43.9.76753.2 24.1.47860.0 38.4.63554.8
EE w/ CBF 65.8.76415.3 56.3.79729.9 48.9.6068.6 57.0.72217.9
Joint w/ CG 57.7.6870.2 10.3.40352.0 22.4.42440.6 30.1.50530.9
EmbodiSteer 74.8.8294.6 60.6.83327.1 57.2.6702.8 64.2.77711.5

Method

Lift Cartesian denoising into joint space

EmbodiSteer method overview

Frozen Cartesian policy

The learned denoiser remains in the embodiment-agnostic end-effector action space.

Joint-space sampling

Forward kinematics and damped Jacobian updates lift the reverse-diffusion sample into the target robot's joints.

Whole-body guidance

CBF-inspired QP guidance steers robot-body motion away from known obstacles while preserving end-effector behavior.

Citation

BibTeX

@misc{wang2026embodisteer
      title={EmbodiSteer: Steering Embodiment-Agnostic Visuomotor Policies with Joint-Space Guidance for Zero-Shot Cross-Embodiment Deployment}, 
      author={Wang, Shihefeng and Lv, Kangchen and Yu, Mingrui and Li, Xiang},
      year={2026},
      eprint={2606.12965},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2606.12965}, 
}