Academic
About Me
I am a third-year undergraduate student in the School of Artificial Intelligence / Qian Xuesen Honors College at Xi’an Jiaotong University. I am currently interning at the Shanghai Artificial Intelligence Laboratory, working closely with Researcher Yilun Chen, and will be joining the lab under the supervision of Research Scientist Jiangmiao Pang.
My research philosophy is simple: I build things because it’s fun.
I’m not chasing lofty slogans — I just genuinely enjoy making robots move, do things, and occasionally accomplish something useful. If a project eventually turns into a paper, great; if it only ends up as a GIF of a robot knocking over a mug, that’s fine too.
Right now, I’m focused on large-scale synthetic data generation, aiming to use it to connect Vision-Language Models with real-world actions. I see simulation as a critical stage in this process: it allows ideas to be tested, refined, and expanded before being deployed in the real world. Then, by incorporating massive amounts of real-world data, the system can become truly grounded. At the same time, I aim to build a systematic engineering setup that runs smoothly and reliably, and to keep the distance from an idea to a working prototype as short as possible.
Here is my academic CV, feel free to download it.
Download CV
Research Interests
Robotics Manipulation
Robotic Manipulation, Grasping, Dexterous Control, Object Interaction
VLA
Vision-Language-Action Models, Multi-modal Learning, Embodied AI
Simulation Platform
Virtual Environments, Physics Simulation, Training Platforms
Open-source Projects
InternManip
Robotics SimulationAn All-in-one robot manipulation learning suite for policy models training and evaluation on various datasets and benchmarks.
InternData-M1
Robotics DatasetInternData-M1 is a comprehensive embodied robotics dataset containing ~250,000 simulation demonstrations with rich frame-based information including 2D/3D boxes, trajectories, grasp points, and semantic masks, with comprehensive annotations.
Publications
2025
InternVLA-M1: Latent Spatial Grounding for Instruction-Following Robotic Manipulation
InternVLA-M1 is a unified framework for spatial grounding and robot control that advances instruction-following robots toward general-purpose intelligence. Its core idea is spatially guided vision-language-action training, where spatial grounding serves as the critical link between instructions and robot actions. InternVLA-M1 employs a two-stage pipeline: (i) spatial grounding pre-training on over 2.3M spatial reasoning data to determine “where to act” by aligning instructions with visual, embodiment-agnostic positions, and (ii) spatially guided action post-training to decide “how to act” by generating embodiment-aware actions through plug-and-play spatial prompting.
GenManip: A Simulation Platform for Generalizable TableTop Manipulation in the Era of MLLM
Embodied manipulation benchmark based on Isaac Sim with automatic demonstration/layout generation and closed-loop test features. Served as a core developer and provided support for subsequent research at SHAILAB. To be released later.
2024
PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation
Proposed a semi-supervised learning framework for medical image segmentation using Mean Teachers to enhance model diversity and regularization. Achieved state-of-the-art results and demonstrated generalization across datasets.