Movatterモバイル変換

NotificationsYou must be signed in to change notification settings
Fork4
Star30

Vector Index Benchmark for Embeddings (VIBE) is an extensible benchmark for approximate nearest neighbor search methods, or vector indexes, using modern embedding datasets.

vector-index-bench.github.io

License

MIT license

30 stars 4 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
fig		fig
slurm		slurm
vibe		vibe
website		website
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
create_dataset.py		create_dataset.py
create_dataset.sh		create_dataset.sh
dataset.def		dataset.def
dataset_environment.yml		dataset_environment.yml
environment.yml		environment.yml
export_results.py		export_results.py
export_results.sh		export_results.sh
install.sh		install.sh
logging.conf		logging.conf
plot.def		plot.def
plot.py		plot.py
plot.sh		plot.sh
pyproject.toml		pyproject.toml
run.py		run.py
run_algorithm.py		run_algorithm.py

Repository files navigation

Vector Index Benchmark for Embeddings (VIBE) is an extensible benchmark for approximate nearest neighbor search methods, or vector indexes, using modern embedding datasets.

Overview

📊 Modern vector index benchmark with embedding datasets
🎯 Includes datasets for both in-distribution and out-out-distribution settings
🏆 Includes the most comprehensive collection of state-of-the-art vector search algorithms
💎 Support for quantized datasets in both 8-bit integer and binary precision
🖥️ Support for HPC environments with Slurm and NUMA
🚀 Support for GPU algorithms

Results

The current VIBE results can be viewed on our website:

https://vector-index-bench.github.io

The website also features several other tools and visualizations to explore the results.

Publication

E. Jääsaari, V. Hyvönen, M. Ceccarello, T. Roos, M. Aumüller.VIBE: Vector Index Benchmark for Embeddings.arXiv preprint arXiv:2505.17810, 2025.

Authors

VIBE is maintained byElias Jääsaari,Matteo Ceccarello, andMartin Aumüller.

Credits

The evaluation code and some algorithm implementations in VIBE are based on theann-benchmarks project.

License

VIBE is available under the MIT License (seeLICENSE). Thepyyaml library is also distributed in thevibe folder under the MIT License.

Getting started

Requirements

Linux
Apptainer (orSingularity)
Python 3.6 or newer

For example, to install Apptainer on Ubuntu:

sudo add-apt-repository -y ppa:apptainer/ppasudo apt updatesudo apt install -y apptainer

Some algorithms may require that the CPU supports AVX-512 instructions and some algorithms may require an Intel CPU due to a dependency on Intel MKL. The GPU algorithms assume that an NVIDIA GPU is available.

Important

For accurate benchmarking, it is recommended to disable SMT/hyperthreading:

echo off | sudo tee /sys/devices/system/cpu/smt/control

On hybrid architectures (e.g., Intel Raptor Lake), it is recommended to disable efficiency (E) cores.

If not running in an HPC or cloud environment, it is also recommended to set the performance governor

sudo cpupower frequency-set -g performance

and to check that transparent huge pages are set tomadvise ornever:

cat /sys/kernel/mm/transparent_hugepage/enabled

Building library images

Building all library images can be done using

./install.sh

The script can be used to either build images for all available libraries (./install.sh) or an image for a single library (e.g../install.sh --algorithm faiss).

Tip

install.sh takes an argument--build-dir that specifies the temporary build directory. For example, to speed up the build in a cluster environment, you can set the build directory to a location on an SSD while the project files are on a slower storage medium.

Tip

See anexample Slurm job for building the libraries using Slurm.

Running benchmarks

The benchmarks for a single dataset can be run usingrun.py. For example:

python3 run.py --dataset agnews-mxbai-1024-euclidean

The run.py script does not depend on any external libraries and can therefore be used without a container or a virtual environment.

Common options for run.py:

--parallelism n: Usen processes for benchmarking.
--module mod: Run the benchmark only for algorithms in module (library)mod.
--algorithm algo: Run the benchmark for only algorithmalgo.
--count k: Run the benchmarks usingk nearest neighbors (default 100).
--gpu: Run the benchmark in GPU mode.

For all options, see

python3 run.py --help

The benchmark should take less than 24 hours to run for a given dataset using parallelism > 12. We recommend having at least 16 GB of memory per used core.

Tip

See anexample Slurm job for running the benchmark using Slurm.

Plotting results

You should first build theplot.sif image:

singularity build plot.sif plot.def

Before plotting, the current results must first be exported:

./export_results.sh --parallelism 8

The results for a dataset can then plotted with e.g.:

./plot.sh --dataset agnews-mxbai-1024-euclidean

To plot the radar chart above, use:

./plot.sh --plot-type radar

For all available options, see:

./plot.sh --help

Tip

You can also useuv to directly runexport_results.py andplot.py without building the container image if preferable. The arguments for these scripts are the same as above.

Creating datasets from scratch

The benchmark code downloads precomputed embedding datasets. However, the datasets can also be recreated from scratch, and it is also possible to create new datasets by modifying thedatasets.py file.

Creating the datasets can be done usingcreate_dataset.sh. It first requires thatdataset.sif is built:

singularity build dataset.sif dataset.def

TheVIBE_CACHE environment variable should be set to a cache directory with at least 200 GB of free space when creating image embeddings using the Landmark or ImageNet datasets. Datasets can then be created using the--dataset argument (the--nv argument specifies that an available GPU can be used):

export VIBE_CACHE=$LOCAL_SCRATCH./create_dataset.sh --singularity-args"--bind$LOCAL_SCRATCH:$LOCAL_SCRATCH --nv" --dataset agnews-mxbai-1024-euclidean

Tip

See anexample Slurm job for creating datasets using Slurm.

Adding a new method to the benchmark

Add your algorithm in the foldervibe/algorithms/{METHOD}/ by providing

Python wrapper in module.py
Singularity container defination in image.def
Hyperparameter grid in config.yml

Evaluation

In-distribution datasets

Name	Type	n	d	Distance
agnews-mxbai-1024-euclidean	Text	769,382	1024	euclidean
arxiv-nomic-768-normalized	Text	1,344,643	768	any
ccnews-nomic-768-normalized	Text	495,328	768	any
celeba-resnet-2048-cosine	Image	201,599	2048	cosine
codesearchnet-jina-768-cosine	Code	1,374,067	768	cosine
glove-200-cosine	Word	1,192,514	200	cosine
gooaq-distilroberta-768-normalized	Text	1,475,024	768	any
imagenet-clip-512-normalized	Image	1,281,167	512	any
landmark-dino-768-cosine	Image	760,757	768	cosine
landmark-nomic-768-normalized	Image	760,757	768	any
simplewiki-openai-3072-normalized	Text	260,372	3072	any
yahoo-minilm-384-normalized	Text	677,305	384	any

Out-of-distribution datasets

Name	Type	n	d	Distance
coco-nomic-768-normalized	Text-to-Image	282,360	768	any
imagenet-align-640-normalized	Text-to-Image	1,281,167	640	any
laion-clip-512-normalized	Text-to-Image	1,000,448	512	any
yandex-200-cosine	Text-to-Image	1,000,000	200	cosine
yi-128-ip	Attention	187,843	128	IP
llama-128-ip	Attention	256,921	128	IP

Algorithms

Method	Version
ANNOY	1.17.3
FALCONN++	git+5fd3f17
FlatNav	0.1.2
CAGRA	25.08.00
GGNN	0.9
GLASS	git+8c69018
HNSW	0.8.0
IVF (Faiss)	1.12.0
IVF-PQ (Faiss)	1.12.0
LVQ (SVS)	0.0.9
LeanVec (SVS)	0.0.9
LoRANN	0.4
MLANN	git+f5d966b
MRPT	2.0.2
NGT-ONNG	2.4.5
NGT-QG	2.4.5
NSG	1.12.0
PUFFINN	git+fd86b0d
PyNNDescent	0.5.13
RoarGraph	git+f2b49b6
ScaNN	1.4.2
SymphonyQG	git+32a0019
Vamana (DiskANN)	0.7.0

About

Vector Index Benchmark for Embeddings (VIBE) is an extensible benchmark for approximate nearest neighbor search methods, or vector indexes, using modern embedding datasets.

vector-index-bench.github.io

Movatterモバイル変換

License

vector-index-bench/vibe

Folders and files

Latest commit

History

Repository files navigation

Overview

Results

Publication

Authors

Credits

License

Getting started

Requirements

Building library images

Running benchmarks

Plotting results

Creating datasets from scratch

Adding a new method to the benchmark

Evaluation

In-distribution datasets

Out-of-distribution datasets

Algorithms

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors2

Uh oh!

Languages

Packages