- Notifications
You must be signed in to change notification settings - Fork34
Is a geometric model required to synthesize novel views from a single image?
License
CompVis/geometry-free-view-synthesis
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Geometry-Free View Synthesis: Transformers and no 3D Priors
Robin Rombach*,Patrick Esser*,Björn Ommer
* equal contribution
RealEstate10K:
Videos:short (2min) /long (12min)
ACID:
Videos:short (2min) /long (9min)
For a quickstart, you can try theColabdemo,but for a smoother experience we recommend installing the local demo asdescribed below.
The demo requires building a PyTorch extension. If you have a sane developmentenvironment with PyTorch, g++ and nvcc, you can simply
pip install git+https://github.com/CompVis/geometry-free-view-synthesis#egg=geometry-free-view-synthesisIf you run into problems and have a GPU with compute capability below 8, youcan also use the provided conda environment:
git clone https://github.com/CompVis/geometry-free-view-synthesisconda env create -f geometry-free-view-synthesis/environment.yamlconda activate geofreepip install geometry-free-view-synthesis/Afterinstallation, running
braindance.pywill start the demo ona sample scene.Explore the scene interactively using theWASD keys to move andarrow keys tolook around. Once positioned, hit thespace bar to render the novel view withGeoGPT.
You can move again with WASD keys. Mouse control can be activated with the mkey. Runbraindance.py <folder to select image from/path to image> to run thedemo on your own images. By default, it uses there-impl-nodepth (trained onRealEstate without explicit transformation and no depth input) which can bechanged with the--model flag. The corresponding checkpoints will bedownloaded the first time they are required. Specify an output path using--video path/to/vid.mp4 to record a video.
> braindance.py -husage: braindance.py [-h] [--model {re_impl_nodepth,re_impl_depth,ac_impl_nodepth,ac_impl_depth}] [--video [VIDEO]] [path]What's up, BD-maniacs?key(s) action =====================================wasd move around arrows look around m enable looking with mousespace render with transformer q quit positional arguments: path path to image or directory from which to select image. Default example is used if not specified.optional arguments: -h, --help show this help message and exit --model {re_impl_nodepth,re_impl_depth,ac_impl_nodepth,ac_impl_depth} pretrained model to use. --video [VIDEO] path to write video recording to. (no recording if unspecified).We support training onRealEstate10KandACID. Both come in the sameformat asdescribed here and thepreparation is the same for both of them. You will need to havecolmap installed and available on your$PATH.
We assume that you have extracted the.txt files of the dataset you want toprepare into$TXT_ROOT, e.g. for RealEstate:
> tree $TXT_ROOT├── test│ ├── 000c3ab189999a83.txt│ ├── ...│ └── fff9864727c42c80.txt└── train ├── 0000cc6d8b108390.txt ├── ... └── ffffe622a4de5489.txtand that you have downloaded the frames (we downloaded them in resolution640 x 360) into$IMG_ROOT, e.g. for RealEstate:
> tree $IMG_ROOT├── test│ ├── 000c3ab189999a83│ │ ├── 45979267.png│ │ ├── ...│ │ └── 55255200.png│ ├── ...│ ├── 0017ce4c6a39d122│ │ ├── 40874000.png│ │ ├── ...│ │ └── 48482000.png├── train│ ├── ...To prepare the$SPLIT split of the dataset ($SPLIT being one oftrain,test for RealEstate andtrain,test,validation for ACID) in$SPA_ROOT, run the following within thescripts directory:
python sparse_from_realestate_format.py --txt_src ${TXT_ROOT}/${SPLIT} --img_src ${IMG_ROOT}/${SPLIT} --spa_dst ${SPA_ROOT}/${SPLIT}You can also simply setTXT_ROOT,IMG_ROOT andSPA_ROOT as environmentvariables and run./sparsify_realestate.sh or./sparsify_acid.sh. Take alook into the sources to run with multiple workers in parallel.
Finally, symlink$SPA_ROOT todata/realestate_sparse/data/acid_sparse.
As described inour paper, we train the transformer models ina compressed, discrete latent space of pretrained VQGANs. These pretrained models can be convenientlydownloaded by running
python scripts/download_vqmodels.pywhich will also create symlinks ensuring that the paths specified in the training configs (seeconfigs/*) exist.In case some of the models have already been downloaded, the script will only create the symlinks.
For training custom first stage models, we refer to thetaming transformersrepository.
After both the preparation of the data and the first stage models are done,the experiments on ACID and RealEstate10K as described in our paper can be reproduced by running
python geofree/main.py --base configs/<dataset>/<dataset>_13x23_<experiment>.yaml -t --gpus 0,where<dataset> is one ofrealestate/acid and<experiment> is one ofexpl_img/expl_feat/expl_emb/impl_catdepth/impl_depth/impl_nodepth/hybrid.These abbreviations correspond to the experiments listed in the following Table (see also Fig.2 in the main paper)
Note that each experiment was conducted on a GPU with 40 GB VRAM.
@misc{rombach2021geometryfree, title={Geometry-Free View Synthesis: Transformers and no 3D Priors}, author={Robin Rombach and Patrick Esser and Björn Ommer}, year={2021}, eprint={2104.07652}, archivePrefix={arXiv}, primaryClass={cs.CV}}About
Is a geometric model required to synthesize novel views from a single image?
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.

