Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Is a geometric model required to synthesize novel views from a single image?

License

NotificationsYou must be signed in to change notification settings

CompVis/geometry-free-view-synthesis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

teaser

Geometry-Free View Synthesis: Transformers and no 3D Priors
Robin Rombach*,Patrick Esser*,Björn Ommer
* equal contribution

arXiv |BibTeX |Colab

Interactive Scene Exploration Results

RealEstate10K:
realestate
Videos:short (2min) /long (12min)

ACID:
acid
Videos:short (2min) /long (9min)

Demo

For a quickstart, you can try theColabdemo,but for a smoother experience we recommend installing the local demo asdescribed below.

Installation

The demo requires building a PyTorch extension. If you have a sane developmentenvironment with PyTorch, g++ and nvcc, you can simply

pip install git+https://github.com/CompVis/geometry-free-view-synthesis#egg=geometry-free-view-synthesis

If you run into problems and have a GPU with compute capability below 8, youcan also use the provided conda environment:

git clone https://github.com/CompVis/geometry-free-view-synthesisconda env create -f geometry-free-view-synthesis/environment.yamlconda activate geofreepip install geometry-free-view-synthesis/

Running

Afterinstallation, running

braindance.py

will start the demo ona sample scene.Explore the scene interactively using theWASD keys to move andarrow keys tolook around. Once positioned, hit thespace bar to render the novel view withGeoGPT.

You can move again with WASD keys. Mouse control can be activated with the mkey. Runbraindance.py <folder to select image from/path to image> to run thedemo on your own images. By default, it uses there-impl-nodepth (trained onRealEstate without explicit transformation and no depth input) which can bechanged with the--model flag. The corresponding checkpoints will bedownloaded the first time they are required. Specify an output path using--video path/to/vid.mp4 to record a video.

> braindance.py -husage: braindance.py [-h] [--model {re_impl_nodepth,re_impl_depth,ac_impl_nodepth,ac_impl_depth}] [--video [VIDEO]] [path]What's up, BD-maniacs?key(s)       action                  =====================================wasd         move around             arrows       look around             m            enable looking with mousespace        render with transformer q            quit                    positional arguments:  path                  path to image or directory from which to select image. Default example is used if not specified.optional arguments:  -h, --help            show this help message and exit  --model {re_impl_nodepth,re_impl_depth,ac_impl_nodepth,ac_impl_depth}                        pretrained model to use.  --video [VIDEO]       path to write video recording to. (no recording if unspecified).

Training

Data Preparation

We support training onRealEstate10KandACID. Both come in the sameformat asdescribed here and thepreparation is the same for both of them. You will need to havecolmap installed and available on your$PATH.

We assume that you have extracted the.txt files of the dataset you want toprepare into$TXT_ROOT, e.g. for RealEstate:

> tree $TXT_ROOT├── test│   ├── 000c3ab189999a83.txt│   ├── ...│   └── fff9864727c42c80.txt└── train    ├── 0000cc6d8b108390.txt    ├── ...    └── ffffe622a4de5489.txt

and that you have downloaded the frames (we downloaded them in resolution640 x 360) into$IMG_ROOT, e.g. for RealEstate:

> tree $IMG_ROOT├── test│   ├── 000c3ab189999a83│   │   ├── 45979267.png│   │   ├── ...│   │   └── 55255200.png│   ├── ...│   ├── 0017ce4c6a39d122│   │   ├── 40874000.png│   │   ├── ...│   │   └── 48482000.png├── train│   ├── ...

To prepare the$SPLIT split of the dataset ($SPLIT being one oftrain,test for RealEstate andtrain,test,validation for ACID) in$SPA_ROOT, run the following within thescripts directory:

python sparse_from_realestate_format.py --txt_src ${TXT_ROOT}/${SPLIT} --img_src ${IMG_ROOT}/${SPLIT} --spa_dst ${SPA_ROOT}/${SPLIT}

You can also simply setTXT_ROOT,IMG_ROOT andSPA_ROOT as environmentvariables and run./sparsify_realestate.sh or./sparsify_acid.sh. Take alook into the sources to run with multiple workers in parallel.

Finally, symlink$SPA_ROOT todata/realestate_sparse/data/acid_sparse.

First Stage Models

As described inour paper, we train the transformer models ina compressed, discrete latent space of pretrained VQGANs. These pretrained models can be convenientlydownloaded by running

python scripts/download_vqmodels.py

which will also create symlinks ensuring that the paths specified in the training configs (seeconfigs/*) exist.In case some of the models have already been downloaded, the script will only create the symlinks.

For training custom first stage models, we refer to thetaming transformersrepository.

Running the Training

After both the preparation of the data and the first stage models are done,the experiments on ACID and RealEstate10K as described in our paper can be reproduced by running

python geofree/main.py --base configs/<dataset>/<dataset>_13x23_<experiment>.yaml -t --gpus 0,

where<dataset> is one ofrealestate/acid and<experiment> is one ofexpl_img/expl_feat/expl_emb/impl_catdepth/impl_depth/impl_nodepth/hybrid.These abbreviations correspond to the experiments listed in the following Table (see also Fig.2 in the main paper)

variants

Note that each experiment was conducted on a GPU with 40 GB VRAM.

BibTeX

@misc{rombach2021geometryfree,      title={Geometry-Free View Synthesis: Transformers and no 3D Priors},       author={Robin Rombach and Patrick Esser and Björn Ommer},      year={2021},      eprint={2104.07652},      archivePrefix={arXiv},      primaryClass={cs.CV}}

About

Is a geometric model required to synthesize novel views from a single image?

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2026 Movatter.jp