Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

NetVLAD: CNN architecture for weakly supervised place recognition

License

NotificationsYou must be signed in to change notification settings

Relja/netvlad

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Version 1.03 (04 Mar 2016)

  • If you used NetVLAD v1.01 or below, you need to upgrade your models usingrelja_simplenn_tidy

This code implements the NetVLAD layer and the weakly supervised training for place recognition presented in [1]. For the link to the paper, trained models and other data, see our project page:http://www.di.ens.fr/willow/research/netvlad/

NetVLAD is distributed under the MIT License (see theLICENCE file).

Setup

Dependencies

The code is written in MATLAB, and depends on the following libraries:

  1. relja_matlab v1.02 or above
  2. MatConvNet (requires v1.0-beta18 or above)
  3. Optional buthighly recommended for speed:Yael_matlab (tested using version 438), and not used for feature extraction (i.e. the feed forward pass)

Data

Datasets

Visit ourproject page for information on how to get the datasets. You can also use your custom dataset by creating the appropriate MATLAB object: inherit fromdatasets/dbBase.m (instructions provided in the file's comments).

Our trained networks

Download them from ourproject page.

If you want to train your networks

In [1] we always started from networks pretrained on other tasks (ImageNet / Places205), download these from theMatConvNet website. Downloadimagenet-caffe-ref andimagenet-vgg-verydeep-16 for the AlexNet and VGG-16 experiments, respectively.

However, one can also start from any custom CNN. ChangeloadNet.m to load your initial network.

Configure the NetVLAD library

CopylocalPaths.m.setup intolocalPaths.m and edit the variables to point to dependencies, dataset locations, pretrained models, etc (detailed information is provided in the file).

Run

Seedemo.m for examples on how to train and test the networks, as explained below. We use Tokyo as a runnning example, but all is analogous if you use Pittsburgh (just change the dataset setup and use the appropriate networks).

The code samples below use the GPU by default, if you want to use the CPU instead (very slow especially for training!), add'useGPU', false to the affected function calls (trainWeakly,addPCA,serialAllFeats,computeRepresentation).

Note that if something fails (e.g. you are missing a dependency, your GPU runs out of RAM, you manually stop execution, etc), you should make sure to delete the potentially created corrupt files before rerunning the code. E.g. if you terminate feature extraction, the output file will be incomplete, so trying to perform testing will fail (files are never recomputed if they exist).

Use/Test our networks

You can download our networks from theproject page.

Set the MATLAB paths:

setup;

Load our network:

netID= 'vd16_tokyoTM_conv5_3_vlad_preL2_intra_white';paths= localPaths();load( sprintf('%s%s.mat', paths.ourCNNs, netID), 'net' );net= relja_simplenn_tidy(net); % potentially upgrate the network to the latest version of NetVLAD / MatConvNet

Compute the image representation by simply running the forward pass using the networknet on the appropriately normalized image (seecomputeRepresentation.m).

im= vl_imreadjpeg({which('football.jpg')}); im= im{1}; % slightly convoluted because we need the full image path for `vl_imreadjpeg`, while `imread` is not appropriate - see `help computeRepresentation`feats= computeRepresentation(net, im); % add `'useGPU', false` if you want to use the CPU

To compute representations for many images, use theserialAllFeats function which is much faster as it uses batches and it moves the network to the GPU only once:

serialAllFeats(net, imPath, imageFns, outputFn);

imageFns is a cell array containing image file names relative to theimPath (i.e.[imPath, imageFns{i}] is a valid JPEG image), the representations are saved in binary format (single 4-byte floats). Batch size used for computing the forward pass can be changed by adding thebatchSize parameter, e.g.'batchSize', 10. Note that if your input images are not all of same size (they are in place recognition datasets), you should setbatchSize to 1.

To test the network on a place recognition dataset, set up the test dataset:

dbTest= dbTokyo247();

Set the output filenames for the database/query image representations:

paths= localPaths();dbFeatFn= sprintf('%s%s_%s_db.bin', paths.outPrefix, netID, dbTest.name);qFeatFn = sprintf('%s%s_%s_q.bin', paths.outPrefix, netID, dbTest.name);

Compute db/query image representations:

serialAllFeats(net, dbTest.dbPath, dbTest.dbImageFns, dbFeatFn, 'batchSize', 10); % adjust batchSize depending on your GPU / network sizeserialAllFeats(net, dbTest.qPath, dbTest.qImageFns, qFeatFn, 'batchSize', 1); % Tokyo 24/7 query images have different resolutions so batchSize is constrained to 1

Measure recall@N

[recall, ~, ~, opts]= testFromFn(dbTest, dbFeatFn, qFeatFn);plot(opts.recallNs, recall, 'ro-'); grid on; xlabel('N'); ylabel('Recall@N'); title(netID, 'Interpreter', 'none');

To test smaller dimensionalities, all that needs to be done (only valid for NetVLAD+whitening networks!) is to keep the first D dimensions and L2-normalize. This is done automatically intestFromFn using thecropToDim option:

recall= testFromFn(dbTest, dbFeatFn, qFeatFn, [], 'cropToDim', 256);

It is also very easy to test our trained networks on the standard object/image retrieval benchmarks, using the same set of steps: load the network, construct the database, compute the features, run the evaluation. SeedemoRetrieval.m for details.

Train

Set the MATLAB paths:

setup;

Load the train and validation sets, e.g. for Tokyo Time Machine:

dbTrain= dbTokyoTimeMachine('train');dbVal= dbTokyoTimeMachine('val');

Run the training:

sessionID= trainWeakly(dbTrain, dbVal, ...    'netID', 'vd16', 'layerName', 'conv5_3', 'backPropToLayer', 'conv5_1', ...    'method', 'vlad_preL2_intra', ...    'learningRate', 0.0001, ...    'doDraw', true);

All arguments oftrainWeakly are explained in more details in thetrainWeakly.m file, here is a brief overview of the essential ones:

  • netID: The name of the network (caffe for AlexNet,vd16 for verydeep-16, i.e. VGG-16)
  • layerName: Which layer to crop the initial network at, we always use the last convolutional layer (i.e. conv5 for caffe and conv5_3 for vd16)
  • backPropToLayer: Down to which layer to perform the learning. If not specified, the entire network is trained, see [1] for the analysis
  • method: Which aggregation method to use for the image representation, default isvlad_preL2_intra (i.e. NetVLAD with input features L2-normalized, and with intra-normalization of the NetVLAD vector). You can also usemax for max pooling,avg for average pooling, or other vlad variants (e.g.vlad_preL2 to disable intra-normalization)
  • learning rate: The learning rate for SGD
  • useGPU: Use the GPU or not
  • doDraw: To plot or not some performance curves as training goes along

Other parameters are explained intrainWeakly.m, including SGD parameters (batch size, momentum, weight decay, learning rate schedule, ..), method parameters (margin size, number of negatives, size of the hard negative memory, ..), etc.

The training periodically saves the latest network and performance curves in files which include the sessionID (can be specified, otherwise generated randomly) and the epoch number, e.g.: 0fd5_ep000002_latest.mat , as well as a copy of that file for the latest epoch in 0fd5_latest.mat .

To find the best network, i.e. the one that performs the best on the validation set (we use recall@N here, where N=5, but any value can be used), run:

[~, bestNet]= pickBestNet(sessionID);

Train PCA + whitening

The best performance is achieved if the dimensionality of the image representation is reduced using PCA together with whitening:

finalNet= addPCA(bestNet, dbTrain, 'doWhite', true, 'pcaDim', 4096);

Additional information

More information is availableREADME_more.md and in comments in the code itself.

References

[1] R. Arandjelović, P. Gronat, A. Torii, T. Pajdla, J. Sivic. "NetVLAD: CNN architecture for weakly supervised place recognition", CoRR, abs/1511.07247, 2015

Changes

  • 1.03 (04 Mar 2016)

    • Fixed a bug in NetVLAD backprop
  • 1.02 (29 Feb 2016)

    • Adapts the code to account for major changes in matconvnet-1.0-beta17's SimpleNN
    • Removed the use of the redundantrelja_simplenn sincevl_simplenn has sufficient functionality now (from matconvnet-1.0-beta18)
  • 1.01 (29 Feb 2016)

    • Easier quick-start withcomputeRepresentation
    • Standard retrieval benchmarks (Oxford, Paris, Holidays) indemoRetrieval.m
    • Additional examples indemo.m: dimensionality reduction with NetVLAD, construction of off-the-shelf-networks
  • 1.00 (04 Dec 2015)

    • Initial public release

About

NetVLAD: CNN architecture for weakly supervised place recognition

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp