InternRobotics/AetherPublic

NotificationsYou must be signed in to change notification settings
Fork6
Star554

[ICCV 2025 & ICCV 2025 RIWM Outstanding Paper] Aether: Geometric-Aware Unified World Modeling

License

MIT license

554 stars 6 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
aether		aether
assets		assets
evaluation		evaluation
scripts		scripts
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.project-root		.project-root
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Repository files navigation

Aether: Geometric-Aware Unified World Modeling

Aether addresses a fundamental challenge in AI: integrating geometric reconstruction with generative modelingfor human-like spatial reasoning. Our framework unifies three core capabilities: (1) 🌏4D dynamic reconstruction,(2) 🎬action-conditioned video prediction, and (3) 🎯goal-conditioned visual planning. Trained entirely onsynthetic data, Aether achieves strong zero-shot generalization to real-world scenarios.

🥳NEWS:

Oct.22nd 2025: Aether won theOutstanding Paper Award at the ICCV 2025 RIWM workshop!
Jun.26th 2025: Aether is accepted by ICCV 2025!
Jun.3rd 2025:DeepVerse is released! It is a 4D auto-regressive world model. Check it out!
Mar.31st 2025: The Gradio demo is available! You can deploy locally or experience Aether online on Hugging Face.
Mar.28th 2025: AetherV1 is released! Model checkpoints, paper, website, and inference code are all available.

🔨 Installation

Note: We recommend using virtual environments such asAnaconda.

#clone projectgit clone https://github.com/OpenRobotLab/Aether.gitcd Aether#create conda environmentconda create -n aether python=3.10conda activate aether#install dependenciespip install -r requirements.txt

🚀 Inference

Warning: When doing reconstruction, Aether pipeline automatically centers crop the input video if its size does not match 480x720.Therefore, for evaluation purpose, we have to slide a 480p window both on the spatial and temporal dimensions,and blend all windows' outputs both spatially and temporally. Examples of video depth and camera pose evaluation can be found atevaluation/.

Run inference demo locally

4D reconstruction:

python scripts/demo.py --task reconstruction --video ./assets/example_videos/moviegen.mp4

Action-conditioned video prediction:

python scripts/demo.py --task prediction --image ./assets/example_obs/car.png --raymap_action assets/example_raymaps/raymap_forward_right.npy

Goal-conditioned visual planning:

python scripts/demo.py --task planning --image ./assets/example_obs_goal/01_obs.png --goal ./assets/example_obs_goal/01_goal.png

Results will be saved in./outputs/ by default.

Run inference demo with Gradio

The Gradio demo provides an interactive web-based Aether experience.

python scripts/demo_gradio.py

Our local testing environment is deployed using an A100 GPU with 80GB of memory, and it is set to run on the local port 7860 by default.

Inference with your own raymap action

Suppose you have a sequence of camera poses, you have to convert it to raymap action trajectories before inference with Aether.Note that your camera poses should bewithin the camera coordinate system of the first frame.You can use thecamera_pose_to_raymap function inpostprocess_utils.py.

# suppose you have the ground-truth depth values:disparity=1./depth[depth>0]dmax=disparity.max()# otherwise, you can set dmax to 1.0 by default:dmax=1.0# then suppose we have a camera trajectory# camera_pose: shape of (N, 4, 4), e.g. N = 41# intrinsic: shape of (N, 3, 3), e.g. N = 41fromaether.utils.postprocess_utilsimportcamera_pose_to_raymap# we will get a raymap sequence of shape (N, 6, h, w)# where h = image height // 8 and w = image width // 8raymap=camera_pose_to_raymap(camera_pose=camera_pose,intrinsic=intrinsic,dmax=dmax)# save the raymapnp.save("/path/to/your/raymap.npy",raymap)

📝 Citation

If you find this work useful in your research, please consider citing:

@article{aether,title     ={Aether: Geometric-Aware Unified World Modeling},author    ={Aether Team and Haoyi Zhu and Yifan Wang and Jianjun Zhou and Wenzheng Chang and Yang Zhou and Zizun Li and Junyi Chen and Chunhua Shen and Jiangmiao Pang and Tong He},journal   ={arXiv preprint arXiv:2503.18945},year      ={2025}}

💡 Limitations

Aether represents an initial step in our journey, trained entirely on synthetic data. While it demonstrates promising capabilities, it is important to be aware of its current limitations:

🔄 Aether struggles with highly dynamic scenarios, such as those involving significant motion or dense crowds.
📸 Its camera pose estimation can be less stable in certain conditions.
📐 For visual planning tasks, we recommend keeping the observations and goals relatively close to ensure optimal performance.

We are actively working on the next generation of Aether and are committed to addressing these limitations in future releases.

📚 License

This repository is licensed under the MIT License - see theLICENSE file for details. For any questions, please email to tonghe90[at]gmail[dot]com.

✨ Acknowledgements

Our work is primarily built uponAccelerate,Diffusers,CogVideoX,Finetrainers,DepthAnyVideo,CUT3R,MonST3R,VBench,GST,SPA,DroidCalib,Grounded-SAM-2,ceres-solver, etc.We extend our gratitude to all these authors for their generously open-sourced code and their significant contributions to the community.

About

[ICCV 2025 & ICCV 2025 RIWM Outstanding Paper] Aether: Geometric-Aware Unified World Modeling

aether-world.github.io/

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Aether: Geometric-Aware Unified World Modeling

🔨 Installation

🚀 Inference

Run inference demo locally

Run inference demo with Gradio

Inference with your own raymap action

📝 Citation

💡 Limitations

📚 License

✨ Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors3

Languages

Movatterモバイル変換

License

InternRobotics/Aether

Folders and files

Latest commit

History

Repository files navigation

Aether: Geometric-Aware Unified World Modeling

🔨 Installation

🚀 Inference

Run inference demo locally

Run inference demo with Gradio

Inference with your own raymap action

📝 Citation

💡 Limitations

📚 License

✨ Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors3

Languages

Packages