I am interested in 'spatial intelligence' and 'physical intelligence'. Specifically, I am interested in the following questions:
1. How can models perceive, understand, and reason in 3D space?
2. How to equip models with physical common knowledge, and finally have them engage in the real world?
We introduce OnlineSI, a framework designed to understand
3D scenes in an online fashion and make detections from streaming video.
Publications
KUDA: Keypoints to Unify Dynamics Learning and Visual Prompting for Open-Vocabulary Robotic Manipulation Zixian Liu*,
Mingtong Zhang*,
Yunzhu Li ICRA, 2025
Best Paper Finalist (top 10%) @ CoRL 2024 Workshop Project Page
/
arXiv
/
Code
KUDA is an open-vocabulary manipulation system that unifies the visual prompting of vision language models (VLMs) and dynamics modeling with keypoints.
Misc
In my spare time, I like playing soccer, watching anime, and playing video games.
I am also an amateur game developer using Unity and RPG Maker.
Most of the time I make games for my own entertainment.
Maybe due to a little obsessive-compulsive personality disorder, coding awesome and structured projects gracefully can make me feel excellent.
In addition, I like to listen to Japanese music,
including soundtracks of FF15, Xenoblade, Nier, etc,
and music made by famous Japanese musicians, such as Joe Hisaishi and Hiroyuki Sawano.