CompVis/geometry-free-view-synthesisPublic

NotificationsYou must be signed in to change notification settings
Fork34
Star378

Is a geometric model required to synthesize novel views from a single image?

License

MIT license

378 stars 34 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
configs		configs
data		data
geofree		geofree
scripts		scripts
License.txt		License.txt
README.md		README.md
environment.yaml		environment.yaml
setup.py		setup.py

Repository files navigation

Geometry-Free View Synthesis: Transformers and no 3D Priors

Geometry-Free View Synthesis: Transformers and no 3D Priors
Robin Rombach*,Patrick Esser*,Björn Ommer
* equal contribution

arXiv |BibTeX |Colab

Interactive Scene Exploration Results

RealEstate10K:

Videos:short (2min) /long (12min)

ACID:

Videos:short (2min) /long (9min)

Demo

For a quickstart, you can try theColabdemo,but for a smoother experience we recommend installing the local demo asdescribed below.

Installation

The demo requires building a PyTorch extension. If you have a sane developmentenvironment with PyTorch, g++ and nvcc, you can simply

pip install git+https://github.com/CompVis/geometry-free-view-synthesis#egg=geometry-free-view-synthesis

If you run into problems and have a GPU with compute capability below 8, youcan also use the provided conda environment:

git clone https://github.com/CompVis/geometry-free-view-synthesisconda env create -f geometry-free-view-synthesis/environment.yamlconda activate geofreepip install geometry-free-view-synthesis/

Running

Afterinstallation, running

braindance.py

will start the demo ona sample scene.Explore the scene interactively using theWASD keys to move andarrow keys tolook around. Once positioned, hit thespace bar to render the novel view withGeoGPT.

You can move again with WASD keys. Mouse control can be activated with the mkey. Runbraindance.py <folder to select image from/path to image> to run thedemo on your own images. By default, it uses there-impl-nodepth (trained onRealEstate without explicit transformation and no depth input) which can bechanged with the--model flag. The corresponding checkpoints will bedownloaded the first time they are required. Specify an output path using--video path/to/vid.mp4 to record a video.

> braindance.py -husage: braindance.py [-h] [--model {re_impl_nodepth,re_impl_depth,ac_impl_nodepth,ac_impl_depth}] [--video [VIDEO]] [path]What's up, BD-maniacs?key(s)       action                  =====================================wasd         move around             arrows       look around             m            enable looking with mousespace        render with transformer q            quit                    positional arguments:  path                  path to image or directory from which to select image. Default example is used if not specified.optional arguments:  -h, --help            show this help message and exit  --model {re_impl_nodepth,re_impl_depth,ac_impl_nodepth,ac_impl_depth}                        pretrained model to use.  --video [VIDEO]       path to write video recording to. (no recording if unspecified).

Training

Data Preparation

We support training onRealEstate10KandACID. Both come in the sameformat asdescribed here and thepreparation is the same for both of them. You will need to havecolmap installed and available on your$PATH.

We assume that you have extracted the.txt files of the dataset you want toprepare into$TXT_ROOT, e.g. for RealEstate:

> tree $TXT_ROOT├── test│   ├── 000c3ab189999a83.txt│   ├── ...│   └── fff9864727c42c80.txt└── train    ├── 0000cc6d8b108390.txt    ├── ...    └── ffffe622a4de5489.txt

and that you have downloaded the frames (we downloaded them in resolution640 x 360) into$IMG_ROOT, e.g. for RealEstate:

> tree $IMG_ROOT├── test│   ├── 000c3ab189999a83│   │   ├── 45979267.png│   │   ├── ...│   │   └── 55255200.png│   ├── ...│   ├── 0017ce4c6a39d122│   │   ├── 40874000.png│   │   ├── ...│   │   └── 48482000.png├── train│   ├── ...

To prepare the$SPLIT split of the dataset ($SPLIT being one oftrain,test for RealEstate andtrain,test,validation for ACID) in$SPA_ROOT, run the following within thescripts directory:

python sparse_from_realestate_format.py --txt_src ${TXT_ROOT}/${SPLIT} --img_src ${IMG_ROOT}/${SPLIT} --spa_dst ${SPA_ROOT}/${SPLIT}

You can also simply setTXT_ROOT,IMG_ROOT andSPA_ROOT as environmentvariables and run./sparsify_realestate.sh or./sparsify_acid.sh. Take alook into the sources to run with multiple workers in parallel.

Finally, symlink$SPA_ROOT todata/realestate_sparse/data/acid_sparse.

First Stage Models

As described inour paper, we train the transformer models ina compressed, discrete latent space of pretrained VQGANs. These pretrained models can be convenientlydownloaded by running

python scripts/download_vqmodels.py

which will also create symlinks ensuring that the paths specified in the training configs (seeconfigs/*) exist.In case some of the models have already been downloaded, the script will only create the symlinks.

For training custom first stage models, we refer to thetaming transformersrepository.

Running the Training

After both the preparation of the data and the first stage models are done,the experiments on ACID and RealEstate10K as described in our paper can be reproduced by running

python geofree/main.py --base configs/<dataset>/<dataset>_13x23_<experiment>.yaml -t --gpus 0,

where<dataset> is one ofrealestate/acid and<experiment> is one ofexpl_img/expl_feat/expl_emb/impl_catdepth/impl_depth/impl_nodepth/hybrid.These abbreviations correspond to the experiments listed in the following Table (see also Fig.2 in the main paper)

Note that each experiment was conducted on a GPU with 40 GB VRAM.

BibTeX

@misc{rombach2021geometryfree,      title={Geometry-Free View Synthesis: Transformers and no 3D Priors},       author={Robin Rombach and Patrick Esser and Björn Ommer},      year={2021},      eprint={2104.07652},      archivePrefix={arXiv},      primaryClass={cs.CV}}

About

Is a geometric model required to synthesize novel views from a single image?

arxiv.org/abs/2104.07652

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Folders and files

Latest commit

History

Repository files navigation

Geometry-Free View Synthesis: Transformers and no 3D Priors

Interactive Scene Exploration Results

Demo

Installation

Running

Training

Data Preparation

First Stage Models

Running the Training

BibTeX

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

License

CompVis/geometry-free-view-synthesis

Folders and files

Latest commit

History

Repository files navigation

Geometry-Free View Synthesis: Transformers and no 3D Priors

Interactive Scene Exploration Results

Demo

Installation

Running

Training

Data Preparation

First Stage Models

Running the Training

BibTeX

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages