Tutorial

This tutorial covers the topic of image-based 3D reconstruction by demonstratingthe individual processing steps in COLMAP. If you are interested in a moregeneral and mathematical introduction to the topic of image-based 3Dreconstruction, please also refer to theCVPR 2017 Tutorial on Large-scale 3DModeling from Crowdsourced Data and[schoenberger_thesis].

Image-based 3D reconstruction from images traditionally first recovers a sparserepresentation of the scene and the camera poses of the input images usingStructure-from-Motion. This output then serves as the input to Multi-View Stereoto recover a dense representation of the scene.

Quickstart

First, start the graphical user interface of COLMAP, as describedhere. COLMAP provides an automatic reconstruction tool that simply takesa folder of input images and produces a sparse and dense reconstruction in aworkspace folder. ClickReconstruction>AutomaticReconstruction in the GUIand specify the relevant options. The output is written to the workspace folder.For example, if your images are located inpath/to/project/images, you couldselectpath/to/project as a workspace folder and after running the automaticreconstruction tool, the folder would look similar to this:

+── images│   +── image1.jpg│   +── image2.jpg│   +── ...+── sparse│   +── 0│   │   +── cameras.bin│   │   +── images.bin│   │   +── points3D.bin│   +── ...+── dense│   +── 0│   │   +── images│   │   +── sparse│   │   +── stereo│   │   +── fused.ply│   │   +── meshed-poisson.ply│   │   +── meshed-delaunay.ply│   +── ...+── database.db

Here, thepath/to/project/sparse contains the sparse models for allreconstructed components, whilepath/to/project/dense contains theircorresponding dense models. The dense point cloudfused.ply can be importedin COLMAP usingFile>Importmodelfrom..., while the dense mesh must bevisualized with an external viewer such as Meshlab.

The following sections give general recommendations and describe thereconstruction process in more detail, if you need more control over thereconstruction process/parameters or if you are interested in the underlyingtechnology in COLMAP.

Structure-from-Motion

Incremental Structure-from-Motion pipeline

COLMAP’s incremental Structure-from-Motion pipeline.

Structure-from-Motion (SfM) is the process of reconstructing 3D structure fromits projections into a series of images. The input is a set of overlappingimages of the same object, taken from different viewpoints. The output is a 3-Dreconstruction of the object, and the reconstructed intrinsic and extrinsiccamera parameters of all images. Typically, Structure-from-Motion systems dividethis process into three stages:

  1. Feature detection and extraction

  2. Feature matching and geometric verification

  3. Structure and motion reconstruction

COLMAP reflects these stages in different modules, that can be combineddepending on the application. More information on Structure-from-Motion ingeneral and the algorithms in COLMAP can be found in[schoenberger16sfm] and[schoenberger16mvs].

If you have control over the picture capture process, please follow theseguidelines for optimal reconstruction results:

  • Capture images withgood texture. Avoid completely texture-less images(e.g., a white wall or empty desk). If the scene does not contain enoughtexture itself, you could place additional background objects, such asposters, etc.

  • Capture images atsimilar illumination conditions. Avoid high dynamicrange scenes (e.g., pictures against the sun with shadows or picturesthrough doors/windows). Avoid specularities on shiny surfaces.

  • Capture images withhigh visual overlap. Make sure that each object isseen in at least 3 images – the more images the better.

  • Capture images fromdifferent viewpoints. Do not take images from thesame location by only rotating the camera, e.g., make a few steps after eachshot. At the same time, try to have enough images from a relatively similarviewpoint. Note that more images is not necessarily better and might lead to aslow reconstruction process. If you use a video as input, considerdown-sampling the frame rate.

Multi-View Stereo

Multi-View Stereo (MVS) takes the output of SfM to compute depth and/or normalinformation for every pixel in an image. Fusion of the depth and normal maps ofmultiple images in 3D then produces a dense point cloud of the scene. Using thedepth and normal information of the fused point cloud, algorithms such as the(screened) Poisson surface reconstruction[kazhdan2013] can then recover the 3Dsurface geometry of the scene. More information on Multi-View Stereo in generaland the algorithms in COLMAP can be found in[schoenberger16mvs].

Preface

COLMAP requires only few steps to do a standard reconstruction for a generaluser. For more experienced users, the program exposes many different parameters,only some of which are intuitive to a beginner. The program should usually workwithout the need to modify any parameters. The defaults are chosen as a trade-off between reconstruction robustness/quality and speed. You can set “optimal”options for different reconstruction scenarios by choosingExtras>Setoptionsfor...data. If in doubt what settings to choose, stick to thedefaults. The source code contains more documentation about all parameters.

COLMAP is research software and in rare cases it may exit ungracefully if someconstraints are not fulfilled. In this case, the program prints a traceback tostdout. To see this traceback or more debug information, it is recommended torun the executables (including the GUI) from the command-line, where you candefine various levels of logging verbosity.

Terminology

The termcamera is associated with the physical object of a camera using thesame zoom-factor and lens. A camera defines the intrinsic projection model inCOLMAP. A single camera can take multiple images with the same resolution,intrinsic parameters, and distortion characteristics. The termimage isassociated with a bitmap file, e.g., a JPEG or PNG file on disk. COLMAP detectskeypoints in each image whose appearance is described by numericaldescriptors. Pure appearance-based correspondences betweenkeypoints/descriptors are defined bymatches, whileinlier matches aregeometrically verified and used for the reconstruction procedure.

Data Structure

COLMAP assumes that all input images are in one input directory with potentiallynested sub-directories. It recursively considers all images stored in thisdirectory, and it supports various different image formats (seeFreeImage). Other files areautomatically ignored. If high performance is a requirement, then you shouldseparate any files that are not images. Images are identified uniquely by theirrelative file path. For later processing, such as image undistortion or densereconstruction, the relative folder structure should be preserved. COLMAP doesnot modify the input images or directory and all extracted data is stored in asingle, self-contained SQLite database file (seeDatabase Format).

The first step is to start the graphical user interface of COLMAP by running thepre-built binaries (Windows:COLMAP.bat, Mac:COLMAP.app) or by executing./src/colmap/exe/colmapgui from the CMake build folder. Next, create a new projectby choosingFile>Newproject. In this dialog, you must select where tostore the database and the folder that contains the input images. Forconvenience, you can save the entire project settings to a configuration file bychoosingFile>Saveproject. The project configuration stores the absolutepath information of the database and image folder in addition to any otherparameter settings. If you decide to move the database or image folder, you mustchange the paths accordingly by creating a new project. Alternatively, theresulting.ini configuration file can be directly modified in a text editor ofyour choice. To reopen an existing project, you can simply open theconfiguration file by choosingFile>Openproject and all parametersettings should be recovered. Note that all COLMAP executables can be startedfrom the command-line by either specifying individual settings as command-linearguments or by providing the path to the project configuration file (seeInterface).

An example folder structure could look like this:

/path/to/project/...+── images│   +── image1.jpg│   +── image2.jpg│   +── ...│   +── imageN.jpg+── database.db+── project.ini

In this example, you would select/path/to/project/images as the image folderpath,/path/to/project/database.db as the database file path, and save theproject configuration to/path/to/project/project.ini.

Feature Detection and Extraction

In the first step, feature detection/extraction finds sparse feature points inthe image and describes their appearance using a numerical descriptor. COLMAPimports images and performs feature detection/extraction in one step in order toonly load images from disk once.

Next, chooseProcessing>Extractfeatures. In this dialog, you must firstdecide on the employed intrinsic camera model. You can either automaticallyextract focal length information from the embedded EXIF information or manuallyspecify intrinsic parameters, e.g., as obtained in a lab calibration. If animage has partial EXIF information, COLMAP tries to find the missing cameraspecifications in a large database of camera models automatically. If all yourimages were captured by the same physical camera with identical zoom factor, itis recommended to share intrinsics between all images. Note that the programwill exit ungracefully if the same camera model is shared among all images butnot all images have the same size or EXIF focal length. If you have severalgroups of images that share the same intrinsic camera parameters, you can easilymodify the camera models at a later point as well (seeDatabase Management). If in doubt what to choose in this step, simply stickto the default parameters.

You can either detect and extract new features from the images or importexisting features from text files. COLMAP extracts SIFT[lowe04] featureseither on the GPU or the CPU. The GPU version requires an attached display,while the CPU version is recommended for use on a server. In general, the GPUversion is favorable as it has a customized feature detection mode that oftenproduces higher quality features in the case of high contrast images. If youimport existing features, every image must have a text file next to it (e.g.,/path/to/image1.jpg and/path/to/image1.jpg.txt) in the following format:

NUM_FEATURES128XYSCALEORIENTATIOND_1D_2D_3...D_128...XYSCALEORIENTATIOND_1D_2D_3...D_128

whereX, Y, SCALE, ORIENTATION are floating point numbers andD_1…D_128values in the range0…255. The file should haveNUM_FEATURES lines withone line per feature. For example, if an image has 4 features, then the textfile should look something like this:

41281.22.30.10.31234...212.23.31.10.33232...320.21.31.10.33232...21.22.31.10.33232...3

Note that by convention the upper left corner of an image has coordinate(0,0) and the center of the upper left most pixel has coordinate(0.5, 0.5). Ifyou must import features for large image collections, it is much more efficientto directly access the database with your favorite scripting language (seeDatabase Format).

If you are done setting all options, chooseExtract and wait for theextraction to finish or cancel. If you cancel during the extraction process, thenext time you start extracting images for the same project, COLMAP automaticallycontinues where it left off. This also allows you to add images to an existingproject/reconstruction. In this case, be sure to verify the camera parameterswhen using shared intrinsics.

All extracted data will be stored in the database file and can bereviewed/managed in the database management tool (seeDatabase Management) or, for experts, directly modified using SQLite (seeDatabase Format).

Feature Matching and Geometric Verification

In the second step, feature matching and geometric verification findscorrespondences between the feature points in different images.

Please, chooseProcessing>Featurematching and select one of the providedmatching modes, that are intended for different input scenarios:

  • Exhaustive Matching: If the number of images in your dataset isrelatively low (up to several hundreds), this matching mode should be fastenough and leads to the best reconstruction results. Here, every image ismatched against every other image, while the block size determines how manyimages are loaded from disk into memory at the same time.

  • Sequential Matching: This mode is useful if the images are acquired insequential order, e.g., by a video camera. In this case, consecutive frameshave visual overlap and there is no need to match all image pairsexhaustively. Instead, consecutively captured images are matched againsteach other. This matching mode has built-in loop detection based on avocabulary tree, where every N-th image (loop_detection_period) is matchedagainst its visually most similar images (loop_detection_num_images). Notethat image file names must be ordered sequentially (e.g.,image0001.jpg,image0002.jpg, etc.). The order in the database is not relevant, since theimages are explicitly ordered according to their file names. Note that loopdetection requires a pre-trained vocabulary tree, that can be downloadedfromhttps://demuc.de/colmap/.

  • Vocabulary Tree Matching: In this matching mode[schoenberger16vote],every image is matched against its visual nearest neighbors using a vocabularytree with spatial re-ranking. This is the recommended matching mode for largeimage collections (several thousands). This requires a pre-trained vocabularytree, that can be downloaded fromhttps://demuc.de/colmap/.

  • Spatial Matching: This matching mode matches every image against itsspatial nearest neighbors. Spatial locations can be manually set in thedatabase management. By default, COLMAP also extracts GPS information fromEXIF and uses it for spatial nearest neighbor search. If accurate priorlocation information is available, this is the recommended matching mode.

  • Transitive Matching: This matching mode uses the transitive relations ofalready existing feature matches to produce a more complete matching graph.If an image A matches to an image B and B matches to C, then this matcherattempts to match A to C directly.

  • Custom Matching: This mode allows to specify individual image pairs formatching or to import individual feature matches. To specify image pairs, youhave to provide a text file with one image pair per line:

    image1.jpgimage2.jpgimage1.jpgimage3.jpg...

    whereimage1.jpg is the relative path in the image folder. You have twooptions to import individual feature matches. Either raw feature matches,which are not geometrically verified or already geometrically verified featurematches. In both cases, the expected format is:

    image1.jpgimage2.jpg011234<empty-line>image1.jpgimage3.jpg01123445<empty-line>...

    whereimage1.jpg is the relative path in the image folder and the pairs ofnumbers are zero-based feature indices in the respective images. If you mustimport many matches for large image collections, it is more efficient todirectly access the database with a scripting language of your choice.

If you are done setting all options, chooseMatch and wait for the matchingto finish or cancel in between. Note that this step can take a significantamount of time depending on the number of images, the number of features perimage, and the chosen matching mode. Expected times for exhaustive matching arefrom a few minutes for tens of images to a few hours for hundreds of images todays or weeks for thousands of images. If you cancel the matching process orimport new images after matching, COLMAP only matches image pairs that have notbeen matched previously. The overhead of skipping already matched image pairs islow. This also enables to match additional images imported after an initialmatching and it enables to combine different matching modes for the samedataset.

All extracted data will be stored in the database file and can bereviewed/managed in the database management tool (seeDatabase Management) or, for experts, directly modified using SQLite (seeDatabase Format).

Note that feature matching requires a GPU and that the display performance ofyour computer might degrade significantly during the matching process. If yoursystem has multiple CUDA-enabled GPUs, you can select specific GPUs with thegpu_index option.

Sparse Reconstruction

After producing the scene graph in the previous two steps, you can start theincremental reconstruction process by choosingReconstruction>Start.COLMAP first loads all extracted data from the database into memory and seedsthe reconstruction from an initial image pair. Then, the scene is incrementallyextended by registering new images and triangulating new points. The results arevisualized in “real-time” during this reconstruction process. Refer to theGraphical User Interface section for more details about theavailable controls. COLMAP attempts to reconstruct multiple models if not allimages are registered into the same model. The different models can be selectedfrom the drop-down menu in the toolbar. If the different models have commonregistered images, you can use themodel_converter executable to merge theminto a single reconstruction (seeFAQ for details).

Ideally, the reconstruction works fine and all images are registered. If this isnot the case, it is recommended to:

  • Perform additional matching. For best results, use exhaustive matching, enableguided matching, increase the number of nearest neighbors in vocabulary treematching, or increase the overlap in sequential matching, etc.

  • Manually choose an initial image pair, if COLMAP fails to initialize. ChooseReconstruction>Reconstructionoptions>Init and set images from thedatabase management tool that have enough matches from different viewpoints.

Importing and Exporting

COLMAP provides several export options for further processing. For fullflexibility, it is recommended to export the reconstruction in COLMAP’s dataformat by choosingFile>Export to export the currently viewed model orFile>Exportall to export all reconstructed models. The model is exportedin the selected folder using separate text files for the reconstructed cameras,images, and points. When exporting in COLMAP’s data format, you can re- importthe reconstruction for later visualization, image undistortion, or to continuean existing reconstruction from where it left off (e.g., after importing andmatching new images). To import a model, chooseFile>Import and select theexport folder path. Alternatively, you can also export the model in variousother formats, such as Bundler, VisualSfM[1], PLY, or VRML by choosingFile>Exportas.... COLMAP can visualize plain PLY point cloud files withRGB information by choosingFile>ImportFrom.... Further information aboutthe format of the exported models can be foundhere.

Dense Reconstruction

After reconstructing a sparse representation of the scene and the camera posesof the input images, MVS can now recover denser scene geometry. COLMAP has anintegrated dense reconstruction pipeline to produce depth and normal maps forall registered images, to fuse the depth and normal maps into a dense pointcloud with normal information, and to finally estimate a dense surface from thefused point cloud using Poisson[kazhdan2013] or Delaunay reconstruction.

To get started, import your sparse 3D model into COLMAP (or select thereconstructed model after finishing the previous sparse reconstruction steps).Then, chooseReconstruction>Multi-viewstereo and select an empty orexisting workspace folder, which is used for the output and of all densereconstruction results. The first step is toundistort the images, second tocompute the depth and normal maps usingstereo, third tofuse the depthand normals maps to a point cloud, followed by a final, optional point cloudmeshing step. During the stereo reconstruction process, the display mightfreeze due to heavy compute load and, if your GPU does not have enough memory,the reconstruction process might ungracefully crash. Please, refer to the FAQ(freeze andmemory) forinformation on how to avoid these problems. Note that the reconstructed normalsof the point cloud cannot be directly visualized in COLMAP, but e.g. in Meshlabby enablingRender>ShowNormal/Curvature. Similarly, the reconstructeddense surface mesh model must be visualized with external software.

In addition to the internal dense reconstruction functionality, COLMAP exportsto several other dense reconstruction libraries, such as CMVS/PMVS[furukawa10]or CMP-MVS[jancosek11]. Please chooseExtras>Undistortimages and selectthe appropriate format. The output folders contain the reconstruction and theundistorted images. In addition, the folders contain sample shell scripts toperform the dense reconstruction. To run PMVS2, execute the following commands:

./path/to/pmvs2 /path/to/undistortion/folder/pmvs/ option-all

where/path/to/undistortion/folder is the folder selected in the undistortiondialog. Make sure not to forget the trailing slash in/path/to/undistortion/folder/pmvs/ in the above command-line arguments.

For large datasets, you probably want to first run CMVS to cluster the sceneinto more manageable parts and then run COLMAP or PMVS2. Please, refer to thesample shell scripts in the undistortion output folder on how to run CMVS incombination with COLMAP or PMVS2. Moreover, there are a number of externallibraries that support COLMAP’s output:

Database Management

You can review and manage the imported cameras, images, and feature matches inthe database management tool. ChooseProcessing>Managedatabase. In theopening dialog, you can see the list of imported images and cameras. You canview the features and matches for each image by clickingShowimage andOverlappingimages. Individual entries in the database tables can bemodified by double clicking specific cells. Note that any changes to thedatabase are only effective after clickingSave.

To share intrinsic camera parameters between arbitrary groups of images, selecta single or multiple images, chooseSetcamera and set thecamera_id,which corresponds to the uniquecamera_id column in the cameras table. You canalso add new cameras with specific parameters. By setting theprior_focal_length flag to 0 or 1, you can give a hint whether thereconstruction algorithm should trust the focal length value. In case of a priorlab calibration, you want to set this value to 1. Without prior knowledge aboutthe focal length, it is recommended to set this value to1.25 *max(width_in_px, height_in_px).

The database management tool has only limited functionality and, for fullcontrol over the data, you must directly modify the SQLite database (seeDatabase Format). By accessing the database directly,you can use COLMAP only for feature extraction and matching or you can importyour own features and matches to only use COLMAP’s incremental reconstructionalgorithm.

Graphical and Command-line Interface

Most of COLMAP’s features are accessible from both the graphical and thecommand-line interface, which are both embedded in the same executable. You canprovide the options directly as command-line arguments or you can provide a.ini project configuration file containing the options using the--project_pathpath/to/project.ini argument. To start the GUI application,please executecolmapgui or directly specify a project configuration ascolmapgui--project_pathpath/to/project.ini to avoid tedious selection inthe GUI. To list the different commands available from the command-line, executecolmaphelp. For example, to run feature extraction from the command-line,you must executecolmapfeature_extractor. Thegraphical userinterface andcommand-line Interface sections provide moredetails about the available commands.

Footnotes

[1]

VisualSfM’s[wu13] projection model applies the distortion to themeasurements and COLMAP to the projection, hence the exported NVM file isnot fully compatible with VisualSfM.