- Notifications
You must be signed in to change notification settings - Fork12
SeGAN: Segmenting and Generating the Invisible (https://arxiv.org/pdf/1703.10239.pdf)
License
ehsanik/SeGAN
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This project is presented as spotlight in CVPR2018.
Humans have strong ability to make inferences about the appearance of the invisible and occluded parts of scenes. For example, when we look at the scene on the left we can make predictions about what is behind the coffee table, and can even complete the sofa based on the visible parts of the sofa, the coffee table, and what we knowin general about sofas and coffee tables and how they occlude each other.
SeGAN can learn to
- Generate theappearance of the occluded parts of objects,
- Segment the invisible parts of objects,
- Although trained on synthetic photo realistic images reliably segmentnatural images,
- By reasoning about occluder-occludee relations inferdepth layering.
If you find this project useful in your research, please consider citing:
@inproceedings{ehsani2018segan, title={Segan: Segmenting and generating the invisible}, author={Ehsani, Kiana and Mottaghi, Roozbeh and Farhadi, Ali}, booktitle={CVPR}, year={2018}}
- Using Torch 7 and dependencies fromthis repository.
- Linux OS
- NVIDIA GPU + CUDA + CuDNN
Clone the repository using the command:
git clone https://github.com/ehsanik/SeGAN cd SeGAN
Download the dataset fromhere and extract it.
Make a link to the dataset.
ln -s /PATH/TO/DATASET dyce_data
Download pretrained weights fromhere and extract it.
Make a link to the weights' folder.
ln -s /PATH/TO/WEIGHTS weights
We introduce DYCE, a dataset of syntheticoccluded objects. This is a synthetic dataset withphoto-realistic images and natural configuration of objectsin scenes. All of the images of this dataset are taken in indoorscenes. The annotations for each image contain thesegmentation mask for the visible and invisible regions ofobjects. The images are obtained by taking snapshots fromour 3D synthetic scenes.
The number of the synthetic scenes that we use is 11,where we use 7 scenes for training and validation, and 4scenes for testing. Overall there are 5 living rooms and 6 kitchens, where 2 living rooms and 2 kitchen are used fortesting. On average, each scene contains 60 objects and thenumber of visible objects per image is 17.5 (by visible wemean having at least 10 visible pixels). There is no commonobject instance in train and test scenes.
The dataset can be downloaded fromhere.
To train your own model:
th main.lua -baseLR 1e-3 -end2end -istrain "train"
Seedata_settings.lua
for additional commandline options.
To test using the pretrained model and reproduce the results in the paper:
Model | Segmentation | Texture | |||
---|---|---|---|---|---|
Visible ∪ Invisible | Visible | Invisible | L1 | L2 | |
Multipath | 47.51 | 48.58 | 6.01 | - | - |
SeGAN(ours) w/ SVpredicted | 68.78 | 64.76 | 15.59 | 0.070 | 0.023 |
SeGAN(ours) w/ SVgt | 75.71 | 68.05 | 23.26 | 0.026 | 0.008 |
th main.lua -weights_segmentation "weights/segment" -end2end -weights_texture "weights/texture" -istrain "test" -predictedSV
For testing using the groundtruth visible mask as input instead of the predicted mask:
th main.lua -weights_segmentation "weights/segment_gt_sv" -end2end -weights_texture "weights/texture_gt_sv" -istrain "test"
Code for GAN network borrows heavily frompix2pix.
About
SeGAN: Segmenting and Generating the Invisible (https://arxiv.org/pdf/1703.10239.pdf)