NotificationsYou must be signed in to change notification settings
Fork27
Star408

[CVPR2024 Highlight] Editable Scene Simulation for Autonomous Driving via LLM-Agent Collaboration

408 stars 27 forks Branches Tags Activity

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
chatsim		chatsim
config		config
data_utils		data_utils
img		img
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Repository files navigation

ChatSim

Editable Scene Simulation for Autonomous Driving via LLM-Agent Collaboration

Arxiv |Project Page |Video

News

[06/12/2024] 🔥🔥🔥background rendering speed up!3D Gaussian splatting is integrated as a background rendering engine, rendering 50 frames within 30s.

[06/12/2024] 🔥🔥🔥foreground rendering speed up! multiple process for blender rendering in parallel! rendering 50 frames within 5 minutes.

Requirement

Ubuntu version >= 20.04 (for using Blender 3.+)
Python >= 3.8
Pytorch >= 1.13
CUDA >= 11.6
COLMAP or Metashape software (not necessary, we provide recalibrated poses)
OpenAI API Key (you can also use other models' API fromNVIDIA AI for free lunch)

Installation

First clone this repo recursively.

git clone https://github.com/yifanlu0227/ChatSim.git --recursive

Step 1: Setup environment

conda create -n chatsim python=3.9 git-lfsconda activate chatsim

Step 2: Install background rendering engine

We offer two background rendering methods, one isMcNeRF in our paper, and another is3D Gaussian Splatting.McNeRF encodes the exposure time and achieves brightness-consistent rendering.3D Gaussian Splatting is much faster (about 50 x) in rendering and has higher PSNR in training views. However, strong perspective shifts result in noticeable artifacts.

McNeRF

mcnerf.mp4

3D Gaussian Splatting

3dgs.mp4

Installing either one is OK! If you want high rendering speed and do not care about brightness inconsistency, choose3D Gaussian Splatting.

Install McNeRF (official implement in the paper)

pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117pip install -r requirements.txtimageio_download_bin freeimage

The installation is the same asF2-NeRF. Please go through the following steps.

cd chatsim/background/mcnerf/# mcnerf use the same data directory.ln -s ../../../data.

Step 2.1: Install dependencies

For Debian based Linux distributions:

sudo apt install zlib1g-dev

For Arch based Linux distributions:

sudo pacman -S zlib

Step 2.2: Download pre-compiled LibTorch

Takingtorch-1.13.1+cu117 for example.

cd chatsim/background/mcnerfcd External# modify the verison if you use a different pytorch installationwget https://download.pytorch.org/libtorch/cu117/libtorch-cxx11-abi-shared-with-deps-1.13.1%2Bcu117.zipunzip ./libtorch-cxx11-abi-shared-with-deps-1.13.1+cu117.ziprm ./libtorch-cxx11-abi-shared-with-deps-1.13.1+cu117.zip

Step 2.3: Compile

The lowest g++ version is 7.5.0.

cd ..cmake. -B buildcmake --build build --target main --config RelWithDebInfo -j

If the mcnerf code is modified, the last two lines should always be executed.

Install 3D Gaussians Splatting

3DGS has much faster inference speed, higher rendering quality. But the HDR sky is not enabled in this case.

Installing 3DGS requires that your CUDA NVCC version matches your pytorch cuda version.

# make CUDA (nvcc) version consistent with the pytorch CUDA version.# first check your CUDA (nvcc) versionnvcc -V# for example: Build cuda_11.8.r11.8# go to https://pytorch.org/get-started/previous-versions/ to find a corresponding one. The version of pytorch itself should >= 1.13.# We list a few options here for quick setup.# CUDA 11.6pip install torch==1.13.0+cu116 torchvision==0.14.0+cu116 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu116# CUDA 11.7pip install torch==1.13.0+cu117 torchvision==0.14.0+cu117 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu117# CUDA 11.8conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 pytorch-cuda=11.8 -c pytorch -c nvidia# CUDA 12.1conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidiapip install -r requirements.txtimageio_download_bin freeimagecd chatsim/background/gaussian-splatting/pip install submodules/simple-knn

Step 3: Install Inpainting tools

Step 3.1: Setup Video Inpainting

cd ../inpainting/Inpaint-Anything/python -m pip install -e segment_anythinggdown https://drive.google.com/drive/folders/1wpY-upCo4GIW4wVPnlMh_ym779lLIG2A -O pretrained_models --foldergdown https://drive.google.com/drive/folders/1SERTIfS7JYyOOmXWujAva4CDQf-W7fjv -O pytracking/pretrain --folder

Step 3.2: Setup Image Inpainting

cd ../latent-diffusionpip install -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformerspip install -e git+https://github.com/openai/CLIP.git@main#egg=clippip install -e.# download pretrained ldmwget -O models/ldm/inpainting_big/last.ckpt https://heibox.uni-heidelberg.de/f/4d9ac7ea40c64582b7c9/?dl=1

Step 4: Install Blender Software and our Blender Utils

We tested withBlender 3.5.1. Note that Blender 3+ requires Ubuntu version >= 20.04.

Step 4.1: Install Blender software

cd ../../Blenderwget https://download.blender.org/release/Blender3.5/blender-3.5.1-linux-x64.tar.xztar -xvf blender-3.5.1-linux-x64.tar.xzrm blender-3.5.1-linux-x64.tar.xz

Step 4.2: Install blender utils for Blender's python

locate the internal Python of Blender, for example,blender-3.5.1-linux-x64/3.5/python/bin/python3.10

export blender_py=$PWD/blender-3.5.1-linux-x64/3.5/python/bin/python3.10cd utils# install dependency (use the -i https://pypi.tuna.tsinghua.edu.cn/simple if you are in the Chinese mainland)$blender_py -m pip install -r requirements.txt$blender_py -m pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple$blender_py setup.py develop

Step 5: Setup Trajectory Tracking Module (optional)

If you want to get smoother and more realistic trajectories, you can install the trajectory module and change the parametermotion_agent-motion_tracking to True in .yaml file. For installation (both code and pre-trained model), you can run the following commands in the terminal. This requires Pytorch >= 1.13.

pip install frozendict gym==0.26.2 stable-baselines3[extra] protobuf==3.20.1cd chatsim/foregroundgit clone --recursive git@github.com:MARMOTatZJU/drl-based-trajectory-tracking.git -b v1.0.0cd drl-based-trajectory-trackingsource setup-minimum.sh

Then when the parametermotion_agent-motion_tracking is set as True, each trajectory will be tracked by this module to make it smoother and more realistic.

Step 6: Install McLight (optional)

If you want to train the skydome model, follow the README inchatsim/foreground/mclight/skydome_lighting/readme.md. You can download our provided skydome HDRI in the next section and start the simulation.

Usage

Data Preparation

Download and extract Waymo data

mkdir datamkdir data/waymo_tfrecordsmkdir data/waymo_tfrecords/1.4.2

Download thewaymo perception dataset v1.4.2 to thedata/waymo_tfrecords/1.4.2. In the google cloud console, the correct folder path iswaymo_open_dataset_v_1_4_2/individual_files/training orwaymo_open_dataset_v_1_4_2/individual_files/validation. Some static scenes we have used are listed here. UseFilter to find them quickly, or usegcloud to download them in batch.

gcloud CLI installation for ubuntu 18.04+ (need sudo)

sudo apt-get updatesudo apt-get install apt-transport-https ca-certificates gnupg curlcurl https://packages.cloud.google.com/apt/doc/apt-key.gpg| sudo gpg --dearmor -o /usr/share/keyrings/cloud.google.gpgecho"deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main"| sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.listsudo apt-get update&& sudo apt-get install google-cloud-cli# for clash proxy user, you may need https://blog.csdn.net/m0_53694308/article/details/134874757gcloud init# login

Static waymo scenes in training set

segment-11379226583756500423_6230_810_6250_810_with_camera_labelssegment-12879640240483815315_5852_605_5872_605_with_camera_labelssegment-13196796799137805454_3036_940_3056_940_with_camera_labelssegment-14333744981238305769_5658_260_5678_260_with_camera_labelssegment-14424804287031718399_1281_030_1301_030_with_camera_labelssegment-16470190748368943792_4369_490_4389_490_with_camera_labelssegment-17761959194352517553_5448_420_5468_420_with_camera_labelssegment-4058410353286511411_3980_000_4000_000_with_camera_labelssegment-10676267326664322837_311_180_331_180_with_camera_labelssegment-1172406780360799916_1660_000_1680_000_with_camera_labelssegment-13085453465864374565_2040_000_2060_000_with_camera_labelssegment-13142190313715360621_3888_090_3908_090_with_camera_labelssegment-13238419657658219864_4630_850_4650_850_with_camera_labelssegment-13469905891836363794_4429_660_4449_660_with_camera_labelssegment-14004546003548947884_2331_861_2351_861_with_camera_labelssegment-14348136031422182645_3360_000_3380_000_with_camera_labelssegment-14869732972903148657_2420_000_2440_000_with_camera_labelssegment-15221704733958986648_1400_000_1420_000_with_camera_labelssegment-15270638100874320175_2720_000_2740_000_with_camera_labelssegment-15349503153813328111_2160_000_2180_000_with_camera_labelssegment-15365821471737026848_1160_000_1180_000_with_camera_labelssegment-15868625208244306149_4340_000_4360_000_with_camera_labelssegment-16345319168590318167_1420_000_1440_000_with_camera_labelssegment-16608525782988721413_100_000_120_000_with_camera_labelssegment-16646360389507147817_3320_000_3340_000_with_camera_labels (deprecated)segment-3425716115468765803_977_756_997_756_with_camera_labelssegment-3988957004231180266_5566_500_5586_500_with_camera_labelssegment-8811210064692949185_3066_770_3086_770_with_camera_labelssegment-9385013624094020582_2547_650_2567_650_with_camera_labels

Static waymo scenes in validation set

segment-10247954040621004675_2180_000_2200_000_with_camera_labelssegment-10061305430875486848_1080_000_1100_000_with_camera_labelssegment-10275144660749673822_5755_561_5775_561_with_camera_labels

If you have installedgcloud, you can download the above tfrecords via

bash data_utils/download_waymo.sh data_utils/waymo_static_32.lst data/waymo_tfrecords/1.4.2

After downloading tfrecords, you should see a folder structure like the following. If you download the tfrecord files from the console, you will also have prefixes likeindividual_files_training_ orindividual_files_validation_.

data|-- ...|-- ...`-- waymo_tfrecords    `-- 1.4.2        |-- segment-10247954040621004675_2180_000_2200_000_with_camera_labels.tfrecord        |-- segment-11379226583756500423_6230_810_6250_810_with_camera_labels.tfrecord        |-- ...        `-- segment-1172406780360799916_1660_000_1680_000_with_camera_labels.tfrecord

We extract the images, camera poses, LiDAR file, etc. out of the tfrecord files with thedata_utils/process_waymo_script.py:

cd data_utilspython process_waymo_script.py --waymo_data_dir=../data/waymo_tfrecords/1.4.2 --nerf_data_dir=../data/waymo_multi_view

This will generate the data folderdata/waymo_multi_view.

Recalibrate Waymo data

Download our recalibrated files

cd ../data# calibration files using metashape# you can also go to https://drive.google.com/file/d/1ms4yhjH5pEDMhyf_CfzNEYq5kj4HILki/view?usp=sharing to download mannuallygdown 1ms4yhjH5pEDMhyf_CfzNEYq5kj4HILkiunzip recalibrated_poses.ziprsync -av recalibrated_poses/ waymo_multi_view/rm -r recalibrated_poses*# if you use 3D Guassian Splatting, you also need to download following files# calibration files using colmap, also point cloud for 3DGS training# you can also go to https://huggingface.co/datasets/yifanlu/waymo_recalibrated_poses_colmap/tree/main to download mannuallygit lfs installgit clone https://huggingface.co/datasets/yifanlu/waymo_recalibrated_poses_colmapgit lfs pull# ~ 2GBtar xvf waymo_recalibrated_poses_colmap.tarcd ..rsync -av waymo_recalibrated_poses_colmap/waymo_multi_view/ waymo_multi_view/rm -rf waymo_recalibrated_poses_colmap

Or recalibrated by yourself

If you want to do the recalibration yourself, you need to use COLMAP or Metashape to calibrate images in thedata/waymo_multi_view/{SCENE_NAME}/images folder and convert them back to the waymo world coordinate. Please follow the tutorial indata_utils/README.md. And the final camera extrinsics and intrinsics are stored ascam_meta.npy (metashape case) orcolmap/sparse_undistorted/cam_meta.npy (colmap case, necessary for 3dgs training).

The final data folder will be like:

data`-- waymo_multi_view|-- ...`-- segment-1172406780360799916_1660_000_1680_000_with_camera_labels|-- 3d_boxes.npy# 3d bounding boxes of the first frame|-- images# a clip of waymo images used in chatsim (typically 40 frames)|-- images_all# full waymo images (typically 198 frames)|-- map.pkl# map data of this scene|-- point_cloud# point cloud file of the first frame|-- cams_meta.npy# Camera ext&int calibrated by metashape and transformed to waymo coordinate system.|-- cams_meta_metashape.npy# Camera ext&int calibrated by metashape (intermediate file, relative scale, not required by simulation inference)|-- cams_meta_colmap.npy# Camera ext&int calibrated by colmap (intermediate file, relative scale, not required by simulation inference)|-- cams_meta_waymo.npy# Camera ext&int from original waymo dataset (intermediate file, not required by simulation inference)|-- shutters# normalized exposure time (mean=0 std=1)|-- tracking_info.pkl# tracking data|-- vehi2veh0.npy# transformation matrix from i-th frame's vehicle coordinate to the first frame's vehicle|-- camera.xml# calibration file from Metashape (intermediate file, not required by simulation inference)`-- colmap/sparse_undistorted/[images/cams_meta.npy/points3D_waymo.ply]# calibration files from COLMAP (intermediate file, only required when using 3dgs rendering)

Coordinate Convention

Points inpoint_cloud/000_xxx.pcd are in the ego vehicle's coordinate
Camera poses incamera.xml are RDF convention (x-right, y-down, z-front).
Camera poses incams_meta.npy are in RUB convention (x-right, y-up, z-back).
vehi2veh0.npy transformation between vehicle coordinates, vehicle coordinates are FLU convention (x-front, y-left, z-up), as Waymo paper illustrated.

cams_meta.npy instruction

cams_meta.shape = (N, 27)cams_meta[:, 0 :12]: flatten camera poses in RUB, world coordinate is the starting frame's vehicle coordinate.cams_meta[:, 12:21]: flatten camse intrinsicscams_meta[:, 21:25]: distortion params [k1, k2, p1, p2]cams_meta[:, 25:27]: bounds [z_near, z_far] (not used.)

Download Blender 3D Assets

Blender Assets. Download with the following command and make sure they are indata/blender_assets.

# suppose you are in ChatSim/datagit lfs installgit clone https://huggingface.co/datasets/yifanlu/Blender_3D_assetscd Blender_3D_assetsgit lfs pull# about 1GB, You might meet `Error updating the Git index: (1/1), 1.0 GB | 7.4 MB/s` when finishing `git lfs pull`. It doesn't matter. Please continue.cd ..mv Blender_3D_assets/assets.zip ./unzip assets.ziprm assets.ziprm -rf Blender_3D_assetsmv assets blender_assets

Our 3D models are collected from the Internet. We tried our best to contact the author of the model and ensure that copyright issues are properly dealt with (our open-source projects are not for profit). If you are the author of a model and our behaviour infringes your copyright, please contact us immediately and we will delete the model.

Download Skydome HDRI

Skydome HDRI. Download with the following command and make sure they are indata/waymo_skydome.

# suppose you are in ChatSim/datagit lfs installgit clone https://huggingface.co/datasets/yifanlu/Skydome_HDRImv Skydome_HDRI/waymo_skydome ./rm -rf Skydome_HDRI

You can also train the skydome estimation network yourself. Go tochatsim/foreground/mclight/skydome_lighting and followchatsim/foreground/mclight/skydome_lighting/readme.md for the training.

Train and simulation

Either trainMcNeRF or3D Gaussian Splatting, depending on your installation.

Train McNeRF

cd chatsim/background/mcnerf

Make sure you have thedata folder linking to../../../data. If haven't, runln -s ../../../data data.Then train your model with

python scripts/run.py --config-name=wanjinyou_big \dataset_name=waymo_multi_view case_name=${CASE_NAME} \exp_name=${EXP_NAME} dataset.shutter_coefficient=0.15 mode=train_hdr_shutter +work_dir=$(pwd)

where${CASE_NAME} are those likesegment-11379226583756500423_6230_810_6250_810_with_camera_labels and${EXP_NAME} can be anything likeexp_coeff_0.15.dataset.shutter_coefficient = 0.15 ordataset.shutter_coefficient = 0.3 work well.

You can simply run scripts likebash train-1137.sh for training andbash render_novel_view-1137.sh for testing.

Train 3D Gaussian Splatting

cd chatsim/background/gaussian-splatting

Make sure you have thedata folder linking to../../../data. If haven't, runln -s ../../../data data.Then train your model with

# exampleSCENE_NAME=segment-11379226583756500423_6230_810_6250_810_with_camera_labelspython train.py --config configs/chatsim/original.yaml source_path=data/waymo_multi_view/${SCENE_NAME}/colmap/sparse_undistorted model_path=output/${SCENE_NAME}# renderingpython render.py -m output/${SCENE_NAME}

You can simply run scripts likebash train-1137.sh for training.

Start simulation

Set the API to an environment variable. Also, setOPENAI_API_BASE if you have network issues (especially in China mainland).

export OPENAI_API_KEY=<your api key>

Now you can start the simulation with

python main.py -y${CONFIG YAML} \               -p${PROMPT} \               [-s${SIMULATION NAME}]

${CONFIG YAML} specifies the scene information, and yamls are stored inconfig folder. e.g.config/waymo-1137.yaml.
${PROMPT} is your input prompt, which should be wrapped in quotation marks. e.g.add a straight driving car in the scene.
${SIMULATION NAME} determines the name of the folder when saving results. defaultdemo.

You can try

# if you train nerfpython main.py -y config/waymo-1137.yaml -p"Add a Benz G in front of me, driving away fast."# if you train 3DGSpython main.py -y config/3dgs-waymo-1137.yaml -p"Add a Benz G in front of me, driving away fast."

The rendered results are saved inresults/1137_demo_%Y_%m_%d_%H_%M_%S. Intermediate files are saved inresults/cache/1137_demo_%Y_%m_%d_%H_%M_%S for debug and visualization ifsave_cache are enabled inconfig/waymo-1137.yaml.

Config file explanation

config/waymo-1137.yaml contains a detailed explanation for each entry. We will give some extra explanation. Suppose the yaml is read intoconfig_dict:

config_dict['scene']['is_wide_angle'] determines the rendering view. If set toTrue, we will expand Waymo's intrinsics (width -> 3 x width) to render wide-angle images. Also note thatis_wide_angle = True comes withrendering_mode = 'render_wide_angle_hdr_shutter';is_wide_angle = False comes withrendering_mode = 'render_hdr_shutter'
config_dict['scene']['frames'] the frame number for rendering.
config_dict['agents']['background_rendering_agent']['nerf_quiet_render'] will determine whether to print the output of mcnerf to the terminal. Set toFalse for debug use.
config_dict['agents']['foreground_rendering_agent']['use_surrounding_lighting'] defines whether we use the surrounding lighting. Currentlyuse_surrounding_lighting = True only takes effect when merely one vehicle is added, because HDRI is a global illumination in Blender. It is difficult to set a separate HDRI for each car.use_surrounding_lighting = True can also lead to slow rendering, since it will call nerf#frame times. We set it toFalse in each default yaml.
config_dict['agents']['foreground_rendering_agent']['skydome_hdri_idx'] is the filename (w.o. extension) we choose fromdata/waymo_skydome/${SCENE_NAME}/. It is the skydome HDRI estimation from the first frame('000') by default, but you can manually select a better estimation from another frame. To view the HDRI, we recommend theVERIV for vscode andtev for desktop environment.

Todo

arxiv paper release
code and model release
motion tracking moduledrl-based-trajectory-tracking (to smooth trajectory)
multi-round wrapper code

Citation

@InProceedings{wei2024editable,      title={Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents},       author={Yuxi Wei and Zi Wang and Yifan Lu and Chenxin Xu and Changxing Liu and Hao Zhao and Siheng Chen and Yanfeng Wang},      booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},      month={June},      year={2024},}

About

[CVPR2024 Highlight] Editable Scene Simulation for Autonomous Driving via LLM-Agent Collaboration

yifanlu0227.github.io/ChatSim

Movatterモバイル変換

yifanlu0227/ChatSim