- Notifications
You must be signed in to change notification settings - Fork4
Vector Index Benchmark for Embeddings (VIBE) is an extensible benchmark for approximate nearest neighbor search methods, or vector indexes, using modern embedding datasets.
License
vector-index-bench/vibe
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
- 📊 Modern vector index benchmark with embedding datasets
- 🎯 Includes datasets for both in-distribution and out-out-distribution settings
- 🏆 Includes the most comprehensive collection of state-of-the-art vector search algorithms
- 💎 Support for quantized datasets in both 8-bit integer and binary precision
- 🖥️ Support for HPC environments with Slurm and NUMA
- 🚀 Support for GPU algorithms
The current VIBE results can be viewed on our website:
https://vector-index-bench.github.io
The website also features several other tools and visualizations to explore the results.
E. Jääsaari, V. Hyvönen, M. Ceccarello, T. Roos, M. Aumüller.VIBE: Vector Index Benchmark for Embeddings.arXiv preprint arXiv:2505.17810, 2025.
VIBE is maintained byElias Jääsaari,Matteo Ceccarello, andMartin Aumüller.
The evaluation code and some algorithm implementations in VIBE are based on theann-benchmarks project.
VIBE is available under the MIT License (seeLICENSE). Thepyyaml library is also distributed in thevibe folder under the MIT License.
- Linux
- Apptainer (orSingularity)
- Python 3.6 or newer
For example, to install Apptainer on Ubuntu:
sudo add-apt-repository -y ppa:apptainer/ppasudo apt updatesudo apt install -y apptainer
Some algorithms may require that the CPU supports AVX-512 instructions and some algorithms may require an Intel CPU due to a dependency on Intel MKL. The GPU algorithms assume that an NVIDIA GPU is available.
Important
For accurate benchmarking, it is recommended to disable SMT/hyperthreading:
echo off | sudo tee /sys/devices/system/cpu/smt/control
On hybrid architectures (e.g., Intel Raptor Lake), it is recommended to disable efficiency (E) cores.
If not running in an HPC or cloud environment, it is also recommended to set the performance governor
sudo cpupower frequency-set -g performance
and to check that transparent huge pages are set tomadvise ornever:
cat /sys/kernel/mm/transparent_hugepage/enabled
Building all library images can be done using
./install.sh
The script can be used to either build images for all available libraries (./install.sh) or an image for a single library (e.g../install.sh --algorithm faiss).
Tip
install.sh takes an argument--build-dir that specifies the temporary build directory. For example, to speed up the build in a cluster environment, you can set the build directory to a location on an SSD while the project files are on a slower storage medium.
Tip
See anexample Slurm job for building the libraries using Slurm.
The benchmarks for a single dataset can be run usingrun.py. For example:
python3 run.py --dataset agnews-mxbai-1024-euclidean
The run.py script does not depend on any external libraries and can therefore be used without a container or a virtual environment.
Common options for run.py:
--parallelism n: Usenprocesses for benchmarking.--module mod: Run the benchmark only for algorithms in module (library)mod.--algorithm algo: Run the benchmark for only algorithmalgo.--count k: Run the benchmarks usingknearest neighbors (default 100).--gpu: Run the benchmark in GPU mode.
For all options, see
python3 run.py --help
The benchmark should take less than 24 hours to run for a given dataset using parallelism > 12. We recommend having at least 16 GB of memory per used core.
Tip
See anexample Slurm job for running the benchmark using Slurm.
You should first build theplot.sif image:
singularity build plot.sif plot.def
Before plotting, the current results must first be exported:
./export_results.sh --parallelism 8
The results for a dataset can then plotted with e.g.:
./plot.sh --dataset agnews-mxbai-1024-euclidean
To plot the radar chart above, use:
./plot.sh --plot-type radar
For all available options, see:
./plot.sh --helpTip
You can also useuv to directly runexport_results.py andplot.py without building the container image if preferable. The arguments for these scripts are the same as above.
The benchmark code downloads precomputed embedding datasets. However, the datasets can also be recreated from scratch, and it is also possible to create new datasets by modifying thedatasets.py file.
Creating the datasets can be done usingcreate_dataset.sh. It first requires thatdataset.sif is built:
singularity build dataset.sif dataset.def
TheVIBE_CACHE environment variable should be set to a cache directory with at least 200 GB of free space when creating image embeddings using the Landmark or ImageNet datasets. Datasets can then be created using the--dataset argument (the--nv argument specifies that an available GPU can be used):
export VIBE_CACHE=$LOCAL_SCRATCH./create_dataset.sh --singularity-args"--bind$LOCAL_SCRATCH:$LOCAL_SCRATCH --nv" --dataset agnews-mxbai-1024-euclidean
Tip
See anexample Slurm job for creating datasets using Slurm.
Add your algorithm in the foldervibe/algorithms/{METHOD}/ by providing
- Python wrapper in module.py
- Singularity container defination in image.def
- Hyperparameter grid in config.yml
| Name | Type | n | d | Distance |
|---|---|---|---|---|
| agnews-mxbai-1024-euclidean | Text | 769,382 | 1024 | euclidean |
| arxiv-nomic-768-normalized | Text | 1,344,643 | 768 | any |
| ccnews-nomic-768-normalized | Text | 495,328 | 768 | any |
| celeba-resnet-2048-cosine | Image | 201,599 | 2048 | cosine |
| codesearchnet-jina-768-cosine | Code | 1,374,067 | 768 | cosine |
| glove-200-cosine | Word | 1,192,514 | 200 | cosine |
| gooaq-distilroberta-768-normalized | Text | 1,475,024 | 768 | any |
| imagenet-clip-512-normalized | Image | 1,281,167 | 512 | any |
| landmark-dino-768-cosine | Image | 760,757 | 768 | cosine |
| landmark-nomic-768-normalized | Image | 760,757 | 768 | any |
| simplewiki-openai-3072-normalized | Text | 260,372 | 3072 | any |
| yahoo-minilm-384-normalized | Text | 677,305 | 384 | any |
| Name | Type | n | d | Distance |
|---|---|---|---|---|
| coco-nomic-768-normalized | Text-to-Image | 282,360 | 768 | any |
| imagenet-align-640-normalized | Text-to-Image | 1,281,167 | 640 | any |
| laion-clip-512-normalized | Text-to-Image | 1,000,448 | 512 | any |
| yandex-200-cosine | Text-to-Image | 1,000,000 | 200 | cosine |
| yi-128-ip | Attention | 187,843 | 128 | IP |
| llama-128-ip | Attention | 256,921 | 128 | IP |
| Method | Version |
|---|---|
| ANNOY | 1.17.3 |
| FALCONN++ | git+5fd3f17 |
| FlatNav | 0.1.2 |
| CAGRA | 25.08.00 |
| GGNN | 0.9 |
| GLASS | git+8c69018 |
| HNSW | 0.8.0 |
| IVF (Faiss) | 1.12.0 |
| IVF-PQ (Faiss) | 1.12.0 |
| LVQ (SVS) | 0.0.9 |
| LeanVec (SVS) | 0.0.9 |
| LoRANN | 0.4 |
| MLANN | git+f5d966b |
| MRPT | 2.0.2 |
| NGT-ONNG | 2.4.5 |
| NGT-QG | 2.4.5 |
| NSG | 1.12.0 |
| PUFFINN | git+fd86b0d |
| PyNNDescent | 0.5.13 |
| RoarGraph | git+f2b49b6 |
| ScaNN | 1.4.2 |
| SymphonyQG | git+32a0019 |
| Vamana (DiskANN) | 0.7.0 |
About
Vector Index Benchmark for Embeddings (VIBE) is an extensible benchmark for approximate nearest neighbor search methods, or vector indexes, using modern embedding datasets.
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.
