- Notifications
You must be signed in to change notification settings - Fork240
Breaking revamp of Rust bindings for v3#670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Uh oh!
There was an error while loading.Please reload this page.
Conversation
Every method has a Sized bound anyway, so might as well put it a levelhigher.Also mark VectorType unsafe, since implementors must follow safetyinvariants, or else the entire program becomes unsound.
ashvardanian commentedNov 1, 2025
Thanks for great suggestions,@tnibler! I'm fine with mega-PRs staged for major library rewrites. Assuming we are targeting the v3, it will take some time to fully rewrite the lib, but I'll keep you posted there. Your suggestions are invaluable 🤗 Before USearch v3, I'll be releasingForkUnion v3 and UCall v1. The former has a Rust bindings as well, in case you have opinions on better parallel/concurrent abstractions for Rust 🦀 |
ashvardanian commentedNov 1, 2025
P.S.: I'd love to include your entire commit history and thought process around different abstractions. It would be great if we could follow the same commit naming structure shared across the history of all projects I maintain, to be compatible withTinySemVer automation. Let's prefix the commits with intent verbs, like |
Uh oh!
There was an error while loading.Please reload this page.
tnibler commentedNov 1, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Oh yes you're right the commits are mess, none of this is final I was just looking for input and okay/disagreement on the rought direction. I'll rebase things into a neat order :) The rough thought process:
Re: casting/number type conversion checks, they're just nice to have. They have to be conservative and really only prevent the obviously nonsensical casts. |
It has to be pub since it's exposed through the public VectorType trait,but details of how FFI calls, trampolines etc are handled probablyshouldn't be part of the public API.
This allows us to remove the unsafe `view_from_buffer` function byintroducing a new IndexView type that is associated with the buffer'slifetime.It also prevents invalid usage of immutable Index instances, which wouldthrow C++ exceptions if e.g., add() was called on an immutable view.
Related to#574 , draft for the v3 Rust bindings. If we're gonna break APIs, let's just go all the way :D
The goal is to prevent unsoundness from the safe Rust interface, turning as many illegal operations as possible into compile errors , and catch the others at runtime.
Sorry for the mega-PR, but I don't know if it makes sense to split this up since all changes kind of affect everything.
Error handling, type safety
In general: existing methods on
VectorTypebecomeunsafe fn xxx_unchecked, and safe variants are added that check all required invariants.There's some ambiguity as to what is actually invalid/unsafe here. As far as I can tell, using an Index created with one
ScalarKind(sayint8) as one of a different type (like inserting anf64vector) does not actually cause any memory unsafety. It can garble data though, which I'd say is enough to make the operationunsafe(pretty common in the ecosystem AFAIK?).So I've added a check that the casts that will be done on the C++ make sense, but it would be good to have more opinions on which ones are ok and which ones are almost certainly mistakes. Some are easy (
f32tof16,bf16is ok,f64tob1x8is not for example), others less so.Safe
Indexviews into file, bufferCurrently you can have an
Indexreading a buffer passed in by the user, which will hopefully stay live as long as theIndex. Though the method is marked unsafe, it's easy to create a dangling pointer situation which is unfortunate in Rust of all places. Also, these immutableIndexviews will throw exceptions if any mutating methods are called on them.My proposed solution is to separate the regular, mutable
Indexthat owns its memory and the immutableIndexViewinto different types. This wayIndexViewcan be associated with a lifetime so the backing buffer can't be dropped while it's being used, and theIndexViewobject (holding a pointer) can not be moved between threads.While we're at it, splitting mutating and readonly methods on
Indexinto different traits allowsIndexViewto only expose read methods and prevent some runtime exceptions this way.Custom metric closure shenanigans
This part could maybe be split into a different PR. First a lot of duplicated code can be written in a generic way, which is always nice. There's also a memory leak pointed out in#629 (fixed by properly dropping the currently set metric closure when it's overwritten).
I only noticed that because I wanted to see how ergonomics could be improved here. Currently custom metrics are boxed closures taking
*const Targuments, which would be very nice to instead turn into&[T]slices of correct length/dimensionality. Also, closures are currently double-boxed becauseBox<dyn Fn..>is a wide pointer, andmetric_punned_twants a regular pointer, so there's three (if I'm counting correctly) pointer dereferences per call to the metric function (trampoline -> pointer to wide pointer -> dynamic dispatch fordyn Fntrait object).Ithink it's possible to add a shim in there to turn the pointers into slices without adding another indirection (currently
closure_addressis a pointer to theBox<dyn>, when it could instead point to the trait object/wide pointer instead and the trampoline casts that into a valid trait object reference). But I'm not sure if it's worth it when you could also just allow the user a regularfn (&[T], &[T]) -> Distancepointer, which avoid a lot of that and covers many (most?) use cases just fine.FFI and closures is a difficult topic though so it would be cool for other to weigh in and check what I'm saying is correct.