lancedb/lancedbPublic

NotificationsYou must be signed in to change notification settings
Fork671
Star8.2k

Looking for the bottleneck in a 25M dataset#2608

valkum started this conversation inGeneral

valkum

Aug 22, 2025

· 3 comments· 4 replies

Return to top

Discussion options

valkum
Aug 22, 2025

Hey, we are evaluating LanceDB to replace our internal in-memory solution for better scalability.
Our test dataset contains 25M rows with 6 columns(String, String, Bool, Bool, String, Vector).
We currently try to find the best values for an index as the default values resulted in poor query performance >60s per query.

Our current index is constructed as follows:

lancedb::index::vector::IvfHnswPqIndexBuilder::default()  // Our embeddings are normalized. Thus cosine similarity is the dot product.  .distance_type(lancedb::DistanceType::Dot)  // HNSW settings we currently use  .ef_construction(100)  .num_edges(32)  // We tried the default but analyze reported a huge scan  .num_partitions(16)  // Optimal for 384 dims  .num_sub_vectors(24),

The test was done on an M1 Max MBP with 64GB RAM
analyze_plan reported this:

AnalyzeExec verbose=true, metrics=[]  TracedExec, metrics=[]    ProjectionExec: expr=[registrable@2 as registrable, etld@3 as etld, is_market@4 as is_market, is_expiring@5 as is_expiring, domain@6 as domain, vector@7 as vector, _distance@0 as _distance], metrics=[output_rows=10, elapsed_compute=3.791µs]      Take: columns="_distance, _rowid, (registrable), (etld), (is_market), (is_expiring), (domain), (vector)", metrics=[output_rows=10, elapsed_compute=3.163863667s, batches_processed=1, bytes_read=23222, iops=92, requests=77]        CoalesceBatchesExec: target_batch_size=1024, metrics=[output_rows=10, elapsed_compute=3.376µs]          GlobalLimitExec: skip=0, fetch=10, metrics=[output_rows=10, elapsed_compute=2.042µs]            SortExec: TopK(fetch=10), expr=[_distance@0 ASC NULLS LAST], preserve_partitioning=[false], metrics=[output_rows=10, elapsed_compute=129.75µs, row_replacements=29]              ANNSubIndex: name=vector_idx, k=10, deltas=1, metrics=[output_rows=160, elapsed_compute=111.847289373s, index_comparisons=0, indices_loaded=0, partitions_searched=16, parts_loaded=9]                ANNIvfPartition: uuid=e76bf63c-8e7b-421c-a5f8-fe2f13fd9f12, minimum_nprobes=20, maximum_nprobes=Some(20), deltas=1, metrics=[output_rows=1, elapsed_compute=18.042µs, deltas_searched=1, index_comparisons=0, indices_loaded=0, partitions_ranked=16, parts_loaded=0]

Note ANNSubIndex which takes 111.847289373s. We got similar results with twice as many partitions or with the default 7.5 (computed bysuggested_num_partitions_for_hnsw).
The real time this takes is more like 50s (I guess whatever measures time in ANNSubIndex uses CPU time instead of real time)
A python based usearch index with the same dataset backed by SQlite to store any payload on the other hand is able to resolve these in <2s.
Something seems off with our LanceDB test but we can't figure out what it is.

Do you have any idea?

You must be logged in to vote

Replies: 3 comments 4 replies

Comment options

valkum
Sep 26, 2025
Author

I now suspect the single partitions were too big to stay in cache and large enough to take a substantial amount of time to load.
Withnum_partitions(1) we get usearch like speeds at the cost of having a single HNSW index in memory (which is partially ok for our use case, especially with SQ and PQ).

You must be logged in to vote

0 replies

Comment options

michael-lancedb
Oct 9, 2025
Collaborator

@valkum the biggest red flag to me here is based on your IVF config ...

With only 16 IVF partitions and nprobes=20, you’re effectively probing all lists on every query. That defeats IVF’s selectivity and forces many sub-index loads (9 in this run), which is the real cost (I/O & cold cache).

However, nprobes = 1 will not be precise enough because it likely doesn't cover enough partitions to have comprehensive coverage.

Does any of this ring true for you?

You must be logged in to vote

2 replies

Comment options

valkum Oct 9, 2025
Author

I mean the amount of IVF partitions is supposed to be in that range when using HSNW. The default one provided for an HNSW index is 7 for 25M rows with a dimension of 384. The nprobes are also the default.

Which values would you suggest for 25M+ or more rows with a dimension of 384?

Comment options

michael-lancedb Oct 10, 2025
Collaborator

I enjoyed getting into more details on this with you yesterday. Thanks for the continuing updates!

Comment options

valkum
Oct 10, 2025
Author

I ran some tests against our dataset with the following configurations today.

This time on an Apple M1 Max with 64 GB RAM.

1000 Partitions

With an increased number of partitions:

IvfHnswPqIndexBuilder::default().distance_type(lancedb::DistanceType::Cosine)                    .ef_construction(128)                    .num_edges(32)                    .num_partitions(1000)

a singleanalyze looks like

AnalyzeExec verbose=true, metrics=[], cumulative_cpu=467.131375ms  TracedExec, metrics=[], cumulative_cpu=467.131375ms    ProjectionExec: expr=[registrable@2 as registrable, etld@3 as etld, is_market@4 as is_market, is_expiring@5 as is_expiring, domain@6 as domain, vector@7 as vector, _distance@0 as _distance], metrics=[output_rows=10, elapsed_compute=1.708µs], cumulative_cpu=467.131375ms      Take: columns="_distance, _rowid, (registrable), (etld), (is_market), (is_expiring), (domain), (vector)", metrics=[output_rows=10, elapsed_compute=454.167µs, batches_processed=1, bytes_read=0, iops=0, requests=0], cumulative_cpu=467.129667ms        CoalesceBatchesExec: target_batch_size=1024, metrics=[output_rows=10, elapsed_compute=4.21µs], cumulative_cpu=466.6755ms          SortExec: TopK(fetch=10), expr=[_distance@0 ASC NULLS LAST, _rowid@1 ASC NULLS LAST], preserve_partitioning=[false], filter=[_distance@0 < 0.83736986 OR _distance@0 = 0.83736986 AND _rowid@1 < 38655517994], metrics=[output_rows=10, elapsed_compute=171.831µs, row_replacements=20], cumulative_cpu=466.67129ms            ANNSubIndex: name=vector_idx, k=10, deltas=1, metrics=[output_rows=200, elapsed_compute=466.379709ms, index_comparisons=0, indices_loaded=0, partitions_searched=20, parts_loaded=19], cumulative_cpu=466.499459ms              ANNIvfPartition: uuid=718b0094-eaef-4960-a9a5-ad0f60ba83f4, minimum_nprobes=20, maximum_nprobes=Some(20), deltas=1, metrics=[output_rows=1, elapsed_compute=119.75µs, deltas_searched=1, index_comparisons=0, indices_loaded=0, partitions_ranked=1000, parts_loaded=0], cumulative_cpu=119.75µs

For 111 test queries with 10 iterations each, we got a median latency of 5ms and a p99 of 578ms.

1 Probe

With a decreased number of nprobes to 1:

IvfHnswPqIndexBuilder::default()                    .distance_type(lancedb::DistanceType::Cosine)                    .ef_construction(128)                    .num_edges(32),

a singleanalyze looks like

AnalyzeExec verbose=true, metrics=[], cumulative_cpu=5.735049958s  TracedExec, metrics=[], cumulative_cpu=5.735049958s    ProjectionExec: expr=[registrable@2 as registrable, etld@3 as etld, is_market@4 as is_market, is_expiring@5 as is_expiring, domain@6 as domain, vector@7 as vector, _distance@0 as _distance], metrics=[output_rows=5, elapsed_compute=1.75µs], cumulative_cpu=5.735049958s      Take: columns="_distance, _rowid, (registrable), (etld), (is_market), (is_expiring), (domain), (vector)", metrics=[output_rows=5, elapsed_compute=478.667µs, batches_processed=1, bytes_read=0, iops=0, requests=0], cumulative_cpu=5.735048208s        CoalesceBatchesExec: target_batch_size=1024, metrics=[output_rows=5, elapsed_compute=4.249µs], cumulative_cpu=5.734569541s          SortExec: TopK(fetch=5), expr=[_distance@0 ASC NULLS LAST, _rowid@1 ASC NULLS LAST], preserve_partitioning=[false], filter=[_distance@0 < 0.85125685 OR _distance@0 = 0.85125685 AND _rowid@1 < 30065359933], metrics=[output_rows=5, elapsed_compute=40.501µs, row_replacements=5], cumulative_cpu=5.734565292s            ANNSubIndex: name=vector_idx, k=5, deltas=1, metrics=[output_rows=5, elapsed_compute=5.734424s, index_comparisons=0, indices_loaded=0, partitions_searched=1, parts_loaded=1], cumulative_cpu=5.734524791s              ANNIvfPartition: uuid=a3d7a4a7-28a7-4682-8462-1882713b7ca0, minimum_nprobes=1, maximum_nprobes=Some(1), deltas=1, metrics=[output_rows=1, elapsed_compute=100.791µs, deltas_searched=1, index_comparisons=0, indices_loaded=0, partitions_ranked=7, parts_loaded=0], cumulative_cpu=100.791µs

For 5 test queries (they were slow so we reduced the amount) with 10 iterations each, we got a median latency of 4ms and a p99 of 8s 960.

I will run some more tests until end of Monday. Including:

increasing the index_cache_size to 20GB
using complete default values (I noted that ef_construction is 300 per default which suggest the default index should have a better quality).

You must be logged in to vote

2 replies

Comment options

michael-lancedb Oct 10, 2025
Collaborator

Thanks for these updates, interesting results so far. I suspect the biggest difference in p99 between those tests is that with only 1 nprobe and 5x10 tests over 1000 partitions you weren't able to get much cache advantage, but even still 8s over that dataset seems excessive.

If you have the appetite for it, it would also be interesting to see how the same tests perform when using an IVF-PQ index rather than the IVF-HNSW.

Comment options

valkum Oct 15, 2025
Author

I will get back to you with those numbers. I also was only able to run the test with complete default values, which showed similar timings as in the initial post.
I hope I find time to run these soon.

Movatterモバイル変換

Looking for the bottleneck in a 25M dataset#2608

Uh oh!

Uh oh!

valkumAug 22, 2025

Replies: 3 comments· 4 replies

Uh oh!

valkumSep 26, 2025 Author

Uh oh!

michael-lancedbOct 9, 2025 Collaborator

Uh oh!

valkumOct 9, 2025 Author

Uh oh!

michael-lancedbOct 10, 2025 Collaborator

Uh oh!

valkumOct 10, 2025 Author

1000 Partitions

1 Probe

Uh oh!

michael-lancedbOct 10, 2025 Collaborator

Uh oh!

valkumOct 15, 2025 Author

Uh oh!

valkum
Aug 22, 2025

Replies: 3 comments 4 replies

valkum
Sep 26, 2025
Author

michael-lancedb
Oct 9, 2025
Collaborator

valkum Oct 9, 2025
Author

michael-lancedb Oct 10, 2025
Collaborator

valkum
Oct 10, 2025
Author

michael-lancedb Oct 10, 2025
Collaborator

valkum Oct 15, 2025
Author