- Notifications
You must be signed in to change notification settings - Fork671
Strange performance with index vs no index on 50K vectors data set#2382
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
-
I recently compared LanceDB to some other solutions usingVectorDBBench. I'll just share the results rendered usingmy own UI, I'm sure you'll see what I mean - there are unexpected differences basically across the board: As I was working on the LanceDB integration for the benchmark, naming and data for things slightly changed, which is why the labels are sometimes not consistent. But you can also infer by the results that anything that is related to no index... was run without index. And everything that says autoindex used an index created with no parameters provided during index creation. You can find the implementation used in the benchmark here:https://github.com/zilliztech/VectorDBBench/blob/main/vectordb_bench/backend/clients/lancedb/lancedb.py |
BetaWas this translation helpful?Give feedback.
All reactions
Replies: 1 comment 6 replies
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
-
Thanks for doing this. A few questions to clarify the setup
The QPS / latency seems a bit off if this is 1M/10M 768D dataset on disk |
BetaWas this translation helpful?Give feedback.
All reactions
❤️ 1
-
I just realized that I forgot to select only "id" in my implementation:zilliztech/VectorDBBench#525 But even so - with the index, it remains high. (Dropped from ~540ms to ~430ms.) |
BetaWas this translation helpful?Give feedback.
All reactions
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
-
430ms latency over 50K vectors on local SSD is abnormally high tho. @davidmyriel@AyushExel@BubbleCal could you take a look? For reference, bruteforce scan over that data on my macbook pro might be even faster |
BetaWas this translation helpful?Give feedback.
All reactions
-
it seems the |
BetaWas this translation helpful?Give feedback.
All reactions
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
-
As mentioned above, that's not relevant for this particular test. There is no filtering by id.
Could you please elaborate on that? Because I would expect that the queries per second should still be higher than without index, regardless of whether the config is ideal. Also, given that the num_partitions default is excessive for the small data set in this test, I would think that recall shouldn't be this low. |
BetaWas this translation helpful?Give feedback.
All reactions
-
I agree we could make defaults much better, especially if we had a way to ask the user their desired in-sample recall. I wrote this up here:lance-format/lance#4094 |
BetaWas this translation helpful?Give feedback.