Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Breaking revamp of Rust bindings for v3#670

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
tnibler wants to merge7 commits intounum-cloud:main
base:main
Choose a base branch
Loading
fromtnibler:safe-unsafe-rust

Conversation

@tnibler
Copy link

Related to#574 , draft for the v3 Rust bindings. If we're gonna break APIs, let's just go all the way :D

The goal is to prevent unsoundness from the safe Rust interface, turning as many illegal operations as possible into compile errors , and catch the others at runtime.

Sorry for the mega-PR, but I don't know if it makes sense to split this up since all changes kind of affect everything.

Error handling, type safety

In general: existing methods onVectorType becomeunsafe fn xxx_unchecked, and safe variants are added that check all required invariants.

There's some ambiguity as to what is actually invalid/unsafe here. As far as I can tell, using an Index created with oneScalarKind (sayint8) as one of a different type (like inserting anf64 vector) does not actually cause any memory unsafety. It can garble data though, which I'd say is enough to make the operationunsafe (pretty common in the ecosystem AFAIK?).

So I've added a check that the casts that will be done on the C++ make sense, but it would be good to have more opinions on which ones are ok and which ones are almost certainly mistakes. Some are easy (f32 tof16,bf16 is ok,f64 tob1x8 is not for example), others less so.

SafeIndex views into file, buffer

Currently you can have anIndex reading a buffer passed in by the user, which will hopefully stay live as long as theIndex. Though the method is marked unsafe, it's easy to create a dangling pointer situation which is unfortunate in Rust of all places. Also, these immutableIndex views will throw exceptions if any mutating methods are called on them.

My proposed solution is to separate the regular, mutableIndex that owns its memory and the immutableIndexView into different types. This wayIndexView can be associated with a lifetime so the backing buffer can't be dropped while it's being used, and theIndexView object (holding a pointer) can not be moved between threads.

While we're at it, splitting mutating and readonly methods onIndex into different traits allowsIndexView to only expose read methods and prevent some runtime exceptions this way.

Custom metric closure shenanigans

This part could maybe be split into a different PR. First a lot of duplicated code can be written in a generic way, which is always nice. There's also a memory leak pointed out in#629 (fixed by properly dropping the currently set metric closure when it's overwritten).

I only noticed that because I wanted to see how ergonomics could be improved here. Currently custom metrics are boxed closures taking*const T arguments, which would be very nice to instead turn into&[T] slices of correct length/dimensionality. Also, closures are currently double-boxed becauseBox<dyn Fn..> is a wide pointer, andmetric_punned_t wants a regular pointer, so there's three (if I'm counting correctly) pointer dereferences per call to the metric function (trampoline -> pointer to wide pointer -> dynamic dispatch fordyn Fn trait object).

Ithink it's possible to add a shim in there to turn the pointers into slices without adding another indirection (currentlyclosure_address is a pointer to theBox<dyn>, when it could instead point to the trait object/wide pointer instead and the trampoline casts that into a valid trait object reference). But I'm not sure if it's worth it when you could also just allow the user a regularfn (&[T], &[T]) -> Distance pointer, which avoid a lot of that and covers many (most?) use cases just fine.

FFI and closures is a difficult topic though so it would be cool for other to weigh in and check what I'm saying is correct.

ashvardanian reacted with heart emoji
Every method has a Sized bound anyway, so might as well put it a levelhigher.Also mark VectorType unsafe, since implementors must follow safetyinvariants, or else the entire program becomes unsound.
@ashvardanian
Copy link
Contributor

Thanks for great suggestions,@tnibler! I'm fine with mega-PRs staged for major library rewrites. Assuming we are targeting the v3, it will take some time to fully rewrite the lib, but I'll keep you posted there. Your suggestions are invaluable 🤗

Before USearch v3, I'll be releasingForkUnion v3 and UCall v1. The former has a Rust bindings as well, in case you have opinions on better parallel/concurrent abstractions for Rust 🦀

@ashvardanian
Copy link
Contributor

P.S.: I'd love to include your entire commit history and thought process around different abstractions. It would be great if we could follow the same commit naming structure shared across the history of all projects I maintain, to be compatible withTinySemVer automation. Let's prefix the commits with intent verbs, likeAdd:,Improve:,Fix:,Docs:,Chore:, orBreak: 🤗

@ashvardanianashvardanian added the v3Breaking changes planned for v3 labelNov 1, 2025
@ashvardanianashvardanian marked this pull request as ready for reviewNovember 1, 2025 13:03
@tnibler
Copy link
Author

tnibler commentedNov 1, 2025
edited
Loading

Oh yes you're right the commits are mess, none of this is final I was just looking for input and okay/disagreement on the rought direction. I'll rebase things into a neat order :)

The rough thought process:

  • If some methods need/want safe wrappers with error checks around the FFI call, it doesn't really cost anything to have the safe/unsafe _unchecked for all/most other methods as well even if they don't do anything special. That way, changes to requirements/invariants for the C++ API can happen without breaking the entire safe Rust API, since everything returnsResult and users have to handle errors. Just gives a bit more flexibility for semver and everything.
  • The safety checks are also mostly the same for allVectorType implementors, so just one default implementation in the trait covers all of them which is nice.
  • Readonly/Read-write distinction at the type/trait level withIndex andIndexView prevents illegal mutation calls to immutableIndex at compile time
  • SeparateIndex andIndexView types also allows tracking the lifetime of the backing buffer for mmapped/in-memory views without adding a lifetime parameter toIndex which would be really annoying.

Re: casting/number type conversion checks, they're just nice to have. They have to be conservative and really only prevent the obviously nonsensical casts.

It has to be pub since it's exposed through the public VectorType trait,but details of how FFI calls, trampolines etc are handled probablyshouldn't be part of the public API.
This allows us to remove the unsafe `view_from_buffer` function byintroducing a new IndexView type that is associated with the buffer'slifetime.It also prevents invalid usage of immutable Index instances, which wouldthrow C++ exceptions if e.g., add() was called on an immutable view.
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@ashvardanianashvardanianashvardanian left review comments

At least 1 approving review is required to merge this pull request.

Assignees

No one assigned

Labels

v3Breaking changes planned for v3

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

2 participants

@tnibler@ashvardanian

[8]ページ先頭

©2009-2025 Movatter.jp