yzcjtr/GeoNetPublic

NotificationsYou must be signed in to change notification settings
Fork184
Star737

Code for GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose (CVPR 2018)

License

MIT license

737 stars 184 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
kitti_eval		kitti_eval
misc		misc
LICENSE		LICENSE
README.md		README.md
data_loader.py		data_loader.py
geonet_main.py		geonet_main.py
geonet_model.py		geonet_model.py
geonet_nets.py		geonet_nets.py
geonet_test_depth.py		geonet_test_depth.py
geonet_test_flow.py		geonet_test_flow.py
geonet_test_pose.py		geonet_test_pose.py
utils.py		utils.py

Repository files navigation

GeoNet

This is a Tensorflow implementation of our paper:

GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose (CVPR 2018)

Zhichao Yin and Jianping Shi

arxiv preprint: (https://arxiv.org/abs/1803.02276)

Requirements

This code has been tested with Python 2.7, TensorFlow 1.1 and CUDA 8.0 on Ubuntu 16.04.

Data preparation

For replicating our results in all of the three tasks (monocular depth, camera pose and optical flow),you need to download the following datasets, and preprocess them into certain formats:

KITTI

Fordepth andflow tasks, the training data isKITTI raw datasetand you can download them by theofficial script;

Forpose task, the training data isKITTI odometry datasetand you should download the calibration files as well as ground truth poses (for evaluation).

After downloaded the data, you can run the following command for preprocessing:

python data/prepare_train_data.py --dataset_dir=/path/to/kitti/dataset/ --dataset_name=kitti_split --dump_root=/path/to/formatted/data/ --seq_length=3 --img_height=128 --img_width=416 --num_threads=16 --remove_static

Fordepth task, the--dataset_name should bekitti_raw_eigen and--seq_length is set to3;

Forflow task, the--dataset_name should bekitti_raw_stereo and--seq_length is set to3;

Forpose task, the--dataset_name should bekitti_odom and--seq_length is set to5.

You can optionally pretrain the model on Cityscapes dataset for any of the three tasks. The required trainingdata is image sequenceleftImg8bit_sequence_trainvaltest.zip and calibration filecamera_trainvaltest.zip.After downloaded them, simply run:

python data/prepare_train_data.py --dataset_dir=/path/to/cityscapes/dataset/ --dataset_name='cityscapes' --dump_root=/path/to/formatted/data/ --seq_length=3 --img_height=171 --img_width=416 --num_threads=16

Training

Our code supports two training modes, corresponding to our stage-wise training strategy.Thetrain_rigid mode is mainly for learning depth and pose, whiletrain_flow mode supports direct or residual flow learning.

Fortrain_rigid mode (depth andpose tasks), run the command

python geonet_main.py --mode=train_rigid --dataset_dir=/path/to/formatted/data/ --checkpoint_dir=/path/to/save/ckpts/ --learning_rate=0.0002 --seq_length=3 --batch_size=4 --max_steps=350000

You can switch the network encoder by setting--dispnet_encoder flag, or perform depth scale normalization (seethis paper for details) by setting--scale_normalize as True.Note that for replicating depth and pose results, the--seq_length is suggested to be 3 and 5 respectively.

Fortrain_flow mode (flow task), run the command

python geonet_main.py --mode=train_flow --dataset_dir=/path/to/formatted/data/ --checkpoint_dir=/path/to/save/ckpts/ --learning_rate=0.0002 --seq_length=3 --flownet_type=direct --max_steps=400000

You can choose to learn direct or residual flow by setting--flownet_type flag.Note that when the--flownet_type isresidual, the--init_ckpt_file should be specified to pointat a model pretrained on the same dataset with mode oftrain_rigid. Also amax_steps more than 200 epochs is preferred for learning residual flow.

Pretrained models

You can download our pretrained models as well as their predictions in all of the three tasks from [Google Drive].Note that they were trained according todifferent splits of KITTI as described in the paper. Following the testing and evaluation instructions below, you should obtain similar results as reported in the paper.

Notes about depth scale normalization

Following most of the original hyperparameters but setting--scale_normalize as True, we have trained our DepthNet better on the Eigen's split of KITTI. The pretrained model is also provided, namelymodel_sn ingeonet_depthnet subfolder. Note this is not included in our paper, but the performance is further improved:

Abs Rel	Sq Rel	RMSE	RMSE(log)	Acc.1	Acc.2	Acc.3
0.149	1.060	5.567	0.226	0.796	0.935	0.975

Testing

We provide testing and evaluation scripts for all of the three tasks.

Monocular Depth

Run the following command

python geonet_main.py --mode=test_depth --dataset_dir=/path/to/kitti/raw/dataset/ --init_ckpt_file=/path/to/trained/model/ --batch_size=1 --depth_test_split=eigen --output_dir=/path/to/save/predictions/

Then you can evaluate the prediction by running

python kitti_eval/eval_depth.py --split=eigen --kitti_dir=/path/to/kitti/raw/dataset/ --pred_file=/path/to/predictions/

Camera Pose

Firstly assuming you have downloaded the KITTI odometry dataset (including groundtruth poses), run

python geonet_main.py --mode=test_pose --dataset_dir=/path/to/kitti/odom/dataset/ --init_ckpt_file=/path/to/trained/model/ --batch_size=1 --seq_length=5 --pose_test_seq=9 --output_dir=/path/to/save/predictions/

Now you have predicted pose snippets. You cangenerate the groundtruth pose snippets by running

python kitti_eval/generate_pose_snippets.py --dataset_dir=/path/to/kitti/odom/dataset/ --output_dir=/path/to/save/gtruth/pose/snippets/ --seq_id=09 --seq_length=5

Then you can evaluate your predictions by

python kitti_eval/eval_pose.py --gtruth_dir=/path/to/gtruth/pose/snippets/ --pred_dir=/path/to/predicted/pose/snippets/

Optical Flow

Firstly you need to download theKITTI flow 2015 dataset and itsmulti-view extension.For replicating our flow results in the paper, aseq_length of 3 is recommended. You need to format the testing data by running

python kitti_eval/generate_multiview_extension.py --dataset_dir=/path/to/data_scene_flow_multiview/ --calib_dir=/path/to/data_scene_flow_calib/ --dump_root=/path/to/formatted/testdata/ --cam_id=02 --seq_length=3

Then you can test your trained model by

python geonet_main.py --mode=test_flow --dataset_dir=/path/to/formatted/testdata/ --init_ckpt_file=/path/to/trained/model/ --flownet_type=direct --batch_size=1 --output_dir=/path/to/save/predictions/

We again provide evaluation script:

python kitti_eval/eval_flow.py --dataset_dir=/path/to/kitti_stereo_2015/ --pred_dir=/path/to/predictions/

Acknowledgements

We thankTinghui Zhou andClément Godard for their great works and repos.

Reference

If you find our work useful in your research please consider citing our paper:

@inproceedings{yin2018geonet,  title     = {GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose},  author    = {Yin, Zhichao and Shi, Jianping},  booktitle = {CVPR},  year = {2018}}

About

Code for GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose (CVPR 2018)

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

GeoNet

Requirements

Data preparation

KITTI

Cityscapes

Training

Pretrained models

Notes about depth scale normalization

Testing

Monocular Depth

Camera Pose

Optical Flow

Acknowledgements

Reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Contributors2

Uh oh!

Languages

Movatterモバイル変換

License

yzcjtr/GeoNet

Folders and files

Latest commit

History

Repository files navigation

GeoNet

Requirements

Data preparation

KITTI

Cityscapes

Training

Pretrained models

Notes about depth scale normalization

Testing

Monocular Depth

Camera Pose

Optical Flow

Acknowledgements

Reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Contributors2

Uh oh!

Languages

Packages