I am interested in 'spatial intelligence' and 'physical intelligence'. Specifically, I am interested in the following questions:
1. How can models perceive, understand, and reason in 3D space?
2. How to equip models with physical common knowledge, and finally have them engage in the real world?
A simple yet effective framework that incorporates both visual and tactile information to guide the behavior of an agent.
KUDA: Keypoints to Unify Dynamics Learning and Visual Prompting for Open-Vocabulary Robotic Manipulation Zixian Liu*,
Mingtong Zhang*,
Yunzhu Li ICRA, 2025
Best Paper Finalist (top 10%) @ CoRL 2024 Workshop Project Page
/
arXiv
/
Code
KUDA is an open-vocabulary manipulation system that unifies the visual prompting of vision language models (VLMs) and dynamics modeling with keypoints.
Misc
In my spare time, I like playing soccer, watching anime, and playing video games.
I am also an amateur game developer using Unity and RPG Maker.
Most of the time I make games for my own entertainment.
Maybe due to a little obsessive-compulsive personality disorder, coding awesome and structured projects gracefully can make me feel excellent.
In addition, I like to listen to Japanese music,
including soundtracks of FF15, Xenoblade, Nier, etc,
and music made by famous Japanese musicians, such as Joe Hisaishi and Hiroyuki Sawano.