- Notifications
You must be signed in to change notification settings - Fork0
InternRobotics/EgoThinker
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
This repo is the official implementation of EgoThinker at NeurIPS 2025
"Unveiling Egocentric Reasoning withSpatio-Temporal CoT"
Baoqi Pei,Yifei Huang,Jilan Xu, Yuping He,Guo Chen,
Fei Wu,Yu Qiao,Jiangmiao Pang
⭐️: We are also working on a updated version forspatial understanding andembodied QA, stay tuned!
Egocentric video reasoning focuses on the unseen, egocentric agent who shapes the scene, demanding inference of hidden intentions and fine-grained interactions—areas where current MLLMs struggle. We present EgoThinker, a framework that equips MLLMs with strong egocentric reasoning via spatio-temporal chain-of-thought supervision and a two-stage curriculum. We build EgoRe-5M, a large-scale QA dataset derived from 13M egocentric clips, featuring multi-minute segments with detailed rationales and dense hand–object grounding. Trained with SFT on EgoRe-5M and refined with RFT for better spatio-temporal localization, EgoThinker outperforms prior methods on multiple egocentric benchmarks and yields substantial gains in fine-grained localization tasks.
2025-11-25: We releasedEvaluation Data, including datasets from Visor data for spatial grounding and the EgoExoLearn dataset for temporal grounding.2025-10-29: We releasedEgoThinker-v1 ckpt andtraining data.2025-10-28: We released our paper and code.
This repo contains three parts:
- EgoThinker-SFT: SFT training code for EgoThinker.
- EgoThinker-RFT: RFT training code for EgoThinker.
- lmms-eval: Evaluation code for egocentric and embodied QA benchmarks.
We welcome feedback and issues. Thank you for trying our EgoThinker!
@misc{pei2025egothinkerunveilingegocentricreasoning,title={EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT},author={Baoqi Pei and Yifei Huang and Jilan Xu and Yuping He and Guo Chen and Fei Wu and Yu Qiao and Jiangmiao Pang},year={2025},eprint={2510.23569},archivePrefix={arXiv},primaryClass={cs.CV},url={https://arxiv.org/abs/2510.23569}, }
Our code is built projects:
- Qwen-VL —https://github.com/QwenLM/Qwen3-VL
- VideoChat-R1 —https://github.com/OpenGVLab/VideoChat-R1
- lmms-eval —https://github.com/EvolvingLMMs-Lab/lmms-eval
About
Official implementation of EgoThinker at NIPS 2025
Resources
Uh oh!
There was an error while loading.Please reload this page.
