You signed in with another tab or window.Reload to refresh your session.You signed out in another tab or window.Reload to refresh your session.You switched accounts on another tab or window.Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+47-11Lines changed: 47 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -81,6 +81,7 @@ Building upon this database, we introduce a baseline framework named <b>Embodied
81
81
82
82
##🔥 News
83
83
84
+
-\[2024-03\] We first release the data and baselines for the challenge. Please fill in the[form](https://docs.google.com/forms/d/e/1FAIpQLScUXEDTksGiqHZp31j7Zp7zlCNV7p_08uViwP_Nbzfn3g6hhw/viewform?usp=sf_link) to apply for downloading the data and try our baselines. Welcome any feedback!
84
85
-\[2024-02\] We will co-organize[Autonomous Grand Challenge](https://opendrivelab.com/challenge2024/) in CVPR 2024. Welcome to try the Multi-View 3D Visual Grounding track! We will release more details about the challenge with the baseline after the Chinese New Year.
85
86
-\[2023-12\] We release the[paper](./assets/EmbodiedScan.pdf) of EmbodiedScan. Please check the[webpage](https://tai-wang.github.io/embodiedscan) and view our demos!
86
87
@@ -146,8 +147,6 @@ We provide a demo for running EmbodiedScan's model on a sample scan. Please refe
146
147
147
148
##📦 Model and Benchmark
148
149
149
-
We will release the code for model training and benchmark with pretrained checkpoints in the 2024 Q1.
150
-
151
150
###Model Overview
152
151
153
152
<palign="center">
@@ -175,31 +174,68 @@ Embodied Perceptron accepts RGB-D sequence with any number of views along with t
We provide configs for different tasks[here](configs/) and you can run the train and test script in the[tools folder](tools/) for training and inference.
180
+
For example, to train a multi-view 3D detection model with pytorch, just run:
Or on the cluster with multiple machines, run the script with the slurm launcher following the sample script provided[here](tools/mv-grounding.sh).
187
+
188
+
NOTE: To run the multi-view 3D grounding experiments, please first download the 3D detection pretrained model to accelerate its training procedure. After downloading the detection checkpoint, please check the path used in the config, for example, the`load_from`[here](https://github.com/OpenRobotLab/EmbodiedScan/blob/main/configs/grounding/mv-grounding_8xb12_embodiedscan-vg-9dof.py#L210), is correct.
189
+
190
+
To inference and evaluate the model (e.g., the checkpoint`work_dirs/mv-3ddet/epoch_12.pth`), just run the test script:
Please see the[paper](./assets/EmbodiedScan.pdf) for details of our two benchmarks, fundamental 3D perception and language-grounded benchmarks. This dataset is still scaling up and the benchmark is being polished and extended. Please stay tuned for our recent updates.
198
+
We preliminarily provide several baseline results here with their logs and pretrained models.
199
+
200
+
Note that the performance is a little different from the results provided in the paper because we re-split the training set as the released training and validation set while keeping the original validation set as the test set for the public benchmark.
Please see the[paper](./assets/EmbodiedScan.pdf) for more details of our two benchmarks, fundamental 3D perception and language-grounded benchmarks. This dataset is still scaling up and the benchmark is being polished and extended. Please stay tuned for our recent updates.
181
217
182
218
##📝 TODO List
183
219
184
220
-\[x\] Release the paper and partial codes for datasets.
185
221
-\[x\] Release EmbodiedScan annotation files.
186
222
-\[x\] Release partial codes for models and evaluation.
-\[x\] Release codes for baselines and benchmarks.
191
227
-\[\] Full release and further updates.
192
228
193
229
##🔗 Citation
194
230
195
231
If you find our work helpful, please cite:
196
232
197
233
```bibtex
198
-
@article{wang2023embodiedscan,
199
-
author={Wang, Tai and Mao, Xiaohan and Zhu, Chenming and Xu, Runsen and Lyu, Ruiyuan and Li, Peisen and Chen, Xiao and Zhang, Wenwei and Chen, Kai and Xue, Tianfan and Liu, Xihui and Lu, Cewu and Lin, Dahua and Pang, Jiangmiao},
200
-
title={EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI},
201
-
journal={Arxiv},
202
-
year={2023}
234
+
@inproceedings{wang2023embodiedscan,
235
+
title={EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI},
236
+
author={Wang, Tai and Mao, Xiaohan and Zhu, Chenming and Xu, Runsen and Lyu, Ruiyuan and Li, Peisen and Chen, Xiao and Zhang, Wenwei and Chen, Kai and Xue, Tianfan and Liu, Xihui and Lu, Cewu and Lin, Dahua and Pang, Jiangmiao},
237
+
year={2024},
238
+
booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},