- Notifications
You must be signed in to change notification settings - Fork338
Make mapv_into_any() work for ArcArray, resolves #1280#1283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:master
Are you sure you want to change the base?
Uh oh!
There was an error while loading.Please reload this page.
Conversation
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
e8a73e6
to2ac3c9f
CompareUh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
tests/array.rs Outdated
#[test] | ||
fn mapv_into_any_diff_types() { | ||
let a: Array<f64, _> = array![[1., 2., 3.], [4., 5., 6.]]; | ||
let a_even: Array<bool, _> = array![[false, true, false], [true, false, true]]; | ||
assert_eq!(a.mapv_into_any(|a| a.round() as i32 % 2 == 0), a_even); | ||
let b: Array<_, _> = a.mapv_into_any(|a| a.round() as i32 % 2 == 0); | ||
assert_eq!(b, a_even); | ||
} | ||
#[test] | ||
fn mapv_into_any_arcarray_same_type() { | ||
let a: ArcArray<f64, _> = array![[1., 2., 3.], [4., 5., 6.]].into_shared(); | ||
let a_plus_one: Array<f64, _> = array![[2., 3., 4.], [5., 6., 7.]]; | ||
let b: ArcArray<_, _> = a.mapv_into_any(|a| a + 1.); | ||
assert_eq!(b, a_plus_one); | ||
} | ||
#[test] | ||
fn mapv_into_any_arcarray_diff_types() { | ||
let a: ArcArray<f64, _> = array![[1., 2., 3.], [4., 5., 6.]].into_shared(); | ||
let a_even: Array<bool, _> = array![[false, true, false], [true, false, true]]; | ||
let b: ArcArray<_, _> = a.mapv_into_any(|a| a.round() as i32 % 2 == 0); | ||
assert_eq!(b, a_even); | ||
} | ||
#[test] | ||
fn mapv_into_any_diff_outer_types() { | ||
let a: Array<f64, _> = array![[1., 2., 3.], [4., 5., 6.]]; | ||
let a_plus_one: Array<f64, _> = array![[2., 3., 4.], [5., 6., 7.]]; | ||
let b: ArcArray<_, _> = a.mapv_into_any(|a| a + 1.); | ||
assert_eq!(b, a_plus_one); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Can you add a test here that looks intoArcArray
-->Array
? That functionality is the original behavior of the function and should remain (or else we'd have to mark as an even greater breaking change). Adding other various tests from the combinatorial set would also be good, e.g.,ArcArray<i32>
-->Array<f32>
or some such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
@akern40, it's great to see ndarray getting some love! Looking at this with fresh eyes after a year, I realized the implementation can be further simplified to avoid breaking changes. The implementation is similar to the original version of mapv_into_any(). The trait bounds have been rewritten to be generic over the input memory representation (e.g. OwnedRepr vs OwnedArcRepr), but the output is constrained to have the same memory representation as the input. This addresses the issue in#1280 without requiring any type hinting.
I have added some additional doctests and removed the unit test converting between Array <--> ArcArray.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
@benkay86 this is awesome! I'm a big fan of the new implementation, I think it's clean. I'm going to close out the other threads in this review; I just had one more request about changing theassert
todebug_assert
(unless you can think of places where the types and traits are satisfied but theassert
fails).
44deee5
tofcc9967
CompareThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This looks great! I like that this approach is consistent with other places in the crate where we output an array of the same type, and the trait bounds appear to be simpler, too.
src/impl_methods.rs Outdated
// have the same kind of memory representation (OwnedRepr vs | ||
// OwnedArcRepr), then their memory representations should be the | ||
// same type, e.g. OwnedRepr<A> == OwnedRepr<B> | ||
assert!(core::any::TypeId::of::<S>() == core::any::TypeId::of::<<S as RawDataSubst<B>>::Output>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Can we change this to adebug_assert
? My impression is that this checks a condition that wethink the type system should be ensuring, but we'd like the assert just to be sure; in other words, it's a little mini test case. Unless there are conditions in which the types and trait bounds are satisfied but thisassert
is false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
You are 100% correct! The type equality should always be satisfied -- the assertions are just a little sanity check around the unsafe code. We totally can change this to adebug_assert!()
. The use ofdebug_assert!()
appears to be consistent with the rest of the ndarray codebase.
I will point out that the compiler nominally optimizes away the assertion if it can prove the types are equal after monomorphization. For example, comment out line 6 ofthis example -- the assembly output does not change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Woah! Had no idea, but I suppose that makes sense - I do think it just expands into a conditional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I went ahead and changes theassert!()
to adebug_assert!()
as we discussed. Is there anything else you would like to see changed for this PR to be merged?
@benkay86 I was doing some thinking about this method not just in the context of concrete types but also in the context of using it in a method. Let's take the following example that's a play on your tests: a function that uses fnis_modulo<S,D>(arr:ArrayBase<S,D>,modulo:i32) ->Array<bool,D> while under this new version the signature is fnis_modulo<S,D>(arr:ArrayBase<S,D>,modulo:i32) ->ArrayBase<<SasRawDataSubst<bool>>::Output,D> Now, the verbosity is one thing - sometimes we need verbose types, we shouldn't always be afraid of them. My bigger concern is that we've actually erased the trait bounds that we originally had on Sorry I don't have a solution to this, at least not yet, and I've got some other things I'm trying to pull together around the repo. Happy to keep the conversation up, though - maybe I'm missing something? Or if you want to bounce ideas / solutions around. Really appreciate your persistence on this PR. |
I appreciate your help thinking about these trait bounds. The combination of fancy trait bounds + unsafe code can be difficult to reason about! With regard to the verbosity in the return value of your function signature, there may be ways to modify I don't think your second concern about "erasing" trait bounds is valid. (And if we could erase them... would that be a bad thing?) The trait bounds on I have added a test case with If you want to go in a totally different direction, I could try to rewrite |
akern40 commentedOct 30, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Sorry I should've been a little clearer when I was talking about "erasing" trait bounds. I'm worried about theoutput, not the input.
This is my point: we know that, but the compiler doesn't. (EDITED, for a better example) Let's say I use fnparity<S,D>(arr:ArrayBase<S,D>,modulo:i32) ->boolwhere --snip--// Trait bounds that enable `mapv_into_any` and modulo, but I'm too lazy to write right now{let output = arr.mapv_into_any(|a| a.round()asi32 %2 ==0);// I can only call a limited set of functions on `output`, those that only have the `RawData` bound.// So I can't map, can't sum, can't take the first or last, can't view, etc} Idk, maybe I'm being ridiculous. |
Ah, I see what you mean now. We could, without loss of generality for extant types in ndarray, add the following additional trait bounds to pubfnmapv_into_any<B,F>(self,mutf:F) ->ArrayBase<<SasRawDataSubst<B>>::Output,D>where// ... <Sas ndarray::RawDataSubst<i32>>::Output:DataOwned +DataMut,{// ...} However, it is not actually necessary to place this trait bound in the definition of use num_traits;fnround<S,A,D>(arr:ArrayBase<S,D>) ->ArrayBase<<Sas ndarray::RawDataSubst<i32>>::Output,D>whereS: ndarray::DataMut<Elem =A> + ndarray::RawDataSubst<i32> +'static,A: num_traits::real::Real + num_traits::cast::AsPrimitive<i32> +'static,D:Dimension,ArrayBase<<Sas ndarray::RawDataSubst<i32>>::Output,D>:From<Array<i32,D>>,{ arr.mapv_into_any(|a| a.round().as_())}#[test]fntest_round(){let a:Array<f32,_> =array![1.1,2.6,3.0];let a_rounded:Array<i32,_> =array![1,3,3];let b =round(a);assert_eq!(b, a_rounded);} Now let's try something like your fnround_sum<S,A,D>(arr:ArrayBase<S,D>) ->i32whereS: ndarray::DataMut<Elem =A> + ndarray::RawDataSubst<i32> +'static,A: num_traits::real::Real + num_traits::cast::AsPrimitive<i32> +'static,D:Dimension,ArrayBase<<Sas ndarray::RawDataSubst<i32>>::Output,D>:From<Array<i32,D>>, <Sas ndarray::RawDataSubst<i32>>::Output: ndarray::Data,{let output = arr.mapv_into_any(|a| a.round().as_()); output.sum()}#[test]fntest_round_sum(){let a:Array<f32,_> =array![1.1,2.6,3.0];let b =round_sum(a);assert_eq!(b,7);} As for the |
Might be time for me to sleep! Really excellent analysis here. I'm in favor of adding that bound to the function, as that would provide clearer error messages to users who may otherwise be confused as to why they can't call additional functions. I need to go to bed, but my last consideration now is the verbosity; let's put a bit of thought into whether there's an obvious path forward on lowering it. Happy to think about it this weekend a bit, or if you've got the time and inclination you can throw a design out. (P.S. Noticed you're at WashU! I'd say "Go Bears" or something but god knows we weren't a big sports school when I was there, can't imagine that's changed. Hope the semester is going well!) |
benkay86 commentedNov 1, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
OK, I've been playing around with the ergonomics. Here's one approach using a helper trait with (stable) generalized associated types to cut down on the trait bounds. The helper trait is something like: pubtraitMappableData:RawData +'static{typeSubst<B>:Data<Elem =B> +DataOwned;fnfrom_owned_repr<B,D:Dimension>(arr:Array<B,D>) ->ArrayBase<Self::Subst<B>,D>;}impl<A:'static>MappableDataforOwnedRepr<A>{typeSubst<B> =OwnedRepr<B>;fnfrom_owned_repr<B,D:Dimension>(arr:Array<B,D>) ->ArrayBase<Self::Subst<B>,D>{ arr}}impl<A:'static>MappableDataforOwnedArcRepr<A>{typeSubst<B> =OwnedArcRepr<B>;fnfrom_owned_repr<B,D:Dimension>(arr:Array<B,D>) ->ArrayBase<Self::Subst<B>,D>{ arr.into()}} The implementation of impl<A,S,D>ArrayBase<S,D>whereS:DataMut<Elem =A> +MappableData,D:Dimension,A:Clone +'static,{pubfnmapv_into_any<B,F>(self,mutf:F) ->ArrayBase<<SasMappableData>::Subst<B>,D>whereF:FnMut(A) ->B,B:'static,{if core::any::TypeId::of::<A>() == core::any::TypeId::of::<B>(){let f = |a|{let b =f(a);unsafe{unlimited_transmute::<B,A>(b)}};let output =self.mapv_into(f);debug_assert!(core::any::TypeId::of::<S>() == core::any::TypeId::of::<<SasMappableData>::Subst<B>>());unsafe{unlimited_transmute::<ArrayBase<S,D>,ArrayBase<<SasMappableData>::Subst<B>,D>>(output)}}else{ <SasMappableData>::from_owned_repr::<B,D>(self.mapv(f))}}} Now we can have more ergonomic calls. For concrete types: let a:Array<f64,_> =array![[1.,2.,3.],[4.,5.,6.]];let a_even = a.mapv_into_any(|a| a.round()asi32 %2 ==0) Function with generic element type but concrete data representation: fnround<A,D>(arr:Array<A,D>) ->Array<i32,D>whereA: num_traits::real::Real + num_traits::cast::AsPrimitive<i32>,D:Dimension,{ arr.mapv_into_any(|a| a.round().as_())}let a:Array<f32,_> =array![1.1,2.6,3.0];let a_rounded =round(a); Function generic over element and data representation: fnround<A,S,D>(arr:ArrayBase<S,D>) ->ArrayBase<<SasMappableData>::Subst<i32>,D>whereS: ndarray::DataMut<Elem =A> +MappableData,A: num_traits::real::Real + num_traits::cast::AsPrimitive<i32>,D:Dimension,{ arr.mapv_into_any(|a| a.round().as_())}let a:Array<f32,_> =array![1.1,2.6,3.0];let a_rounded =round(a); Generic function operating on the (generic) output of fnround_sum<S,A,D>(arr:ArrayBase<S,D>) ->i32whereS: ndarray::DataMut<Elem =A> +MappableData,A: num_traits::real::Real + num_traits::cast::AsPrimitive<i32>,D:Dimension,{let output = arr.mapv_into_any(|a| a.round().as_()); output.sum()}let a:Array<f32,_> =array![1.1,2.6,3.0];let a_sum =round_sum(a); Happy to let these ideas percolate a bit more, but let me know what you think. If we go this route then we'll have to decide on what to call the helper trait
Ha, I am always running into WashU alumni! What program were you in here? I'm a physician/scientist faculty using Rust to power my neuroimaging research. |
akern40 commentedNov 4, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I think this is a great design. I want to suggest a slight twist inspired by yours that I think balances between ergonomics for the user and maintenance / fragility for the library (and suggest an alternative name):
pubtraitDataSubst<A>:RawDataSubst<A> +DataOwnedwhereSelf::Output:DataOwned{fnfrom_owned<D:Dimension>(self_:Array<A,D>) ->ArrayBase<Self::Output,D>;} I like the name Unfortunately, this isn'tquite as succinct for users as yours, but it has the advantage of not repeating the associated types within the library, which I like. The definition of pubfnmapv_into_any<B,F>(self,mutf:F) ->ArrayBase<<SasRawDataSubst<B>>::Output,D>whereS:DataMut +DataSubst<B,Output:DataOwned>,F:FnMut(A) ->B,A:Clone +'static,B:'static,{if core::any::TypeId::of::<A>() == core::any::TypeId::of::<B>(){// A and B are the same type.// Wrap f in a closure of type FnMut(A) -> A .let f = |a|{let b =f(a);// Safe because A and B are the same type.unsafe{unlimited_transmute::<B,A>(b)}};// Delegate to mapv_into() using the wrapped closure.// Convert output to a uniquely owned array of type Array<A, D>.let output =self.mapv_into(f).into_owned();// Change the return type from Array<A, D> to Array<B, D>.// Again, safe because A and B are the same type.unsafe{unlimited_transmute::<Array<A,D>,ArrayBase<<SasRawDataSubst<B>>::Output,D>>(output)}}else{// A and B are not the same type.// Fallback to mapv().S::from_owned(self.mapv(f))}} Rust won't directly suggest the clean Thoughts?
I did my computer science degrees there! Exciting to see this library being used for research-level neuroscience. |
Oh and I don't think you need the |
benkay86 commentedNov 4, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
It's only needed if you want the extra
I considered this possibility since the trait is, conceptually, a natural extension of impl<A,S,D>ArrayBase<S,D>whereS:DataMut<Elem=A> +MappableData Instead you must introduce the constraint at the definition of pubfnmapv_into_any<B,F>(self,mutf:F> ->ArrayBase<<SasRawDataSubst<B>>::Output,D>whereS:DataMut<Elem =A> +DataSubst<B,Output:DataOwned> But now every function calling Unfortunately, the GAT version of
I'm wondering if |
akern40 commentedNov 5, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I recently got Miri support working as part of our CI/CD, so it actually will run Miri the next time you push!
This is true, but unfortunately I think it's trickier than that: implementing impl<'a,A:'a>RawDataSubstforViewRepr<&'aA>{typeOutput<B> =ViewRepr<&'aB>;// Problem occurs hereunsafefndata_subst<B>(self) ->Self::Output<B>{ViewRepr::new()}} You can't write
Ya, like I said, I was trying to find a middle ground. It's still significantly more succinct than the implementation without the additional trait, but it isn't quite as terse as the
No reason it shouldn't, especially if it means a quick |
benkay86 commentedNov 5, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
To make pubtraitMappableData:RawData{typeSubst<B>:Data<Elem =B> +DataOwnedwherefor<'b>B:'b;fnfrom_owned_repr<B,D:Dimension>(arr:Array<B,D>) ->ArrayBase<Self::Subst<B>,D>wherefor<'b>B:'b;}// omitting other impls for brevityimpl<'a,A>MappableDataforCowRepr<'a,A>{typeSubst<B> =CowRepr<'a,B>wherefor<'b>B:'b;fnfrom_owned_repr<B,D:Dimension>(arr:Array<B,D>) ->ArrayBase<Self::Subst<B>,D>wherefor<'b>B:'b{ arr.into()}} EDIT It looks like the HRB To your broader points about balancing maintainability with ergonomics, my counterargument is:
The final choice is yours. If you still prefer the |
Thanks for giving me time to think about this! After a bit of experimenting this morning, I like your approach. I'd like to propose a few changes:
/// An array representation trait for mapping to an owned array with a different element type.////// Functions such as [`mapv_into_any`](ArrayBase::mapv_into_any) that alter an array's/// underlying element type often want to preserve the storage type (e.g., `ArcArray`)/// of the original array. However, because Rust considers `OwnedRepr<A>` and `OwnedRepr<B>`/// to be completely different types, a trait must be used to indicate what the mapping is.////// This trait will map owning storage types to the element-swapped version of themselves;/// view types are mapped to `OwnedRepr`. The following table summarizes the mappings:////// | Original Storage Type | Corresponding Array Type | Mapped Storage Type | Output of `from_owned` |/// | ----------------------- | ------------------------ | ------------------- | ---------------------- |/// | `OwnedRepr<A>` | `Array<A, D>` | `OwnedRepr<B>` | `Array<B, D>` |/// | `OwnedArcRepr<A>` | `ArcArray<A, D>` | `OwnedArcRepr<B>` | `ArcArray<B, D>` |/// | `CowRepr<'a, A>` | `CowArray<'a, A, D>` | `CowRepr<'a, B>` | `CowArray<'a, B, D>` |/// | `ViewRepr<&'a A>` | `ArrayView<'a, A, D>` | `OwnedRepr<B>` | `Array<B, D>` |/// | `RawViewRepr<*const A>` | `RawArrayView<A, D>` | `OwnedRepr<B>` | `Array<B, D>` |pubtraitDataMappable:RawData{/// The element-swapped, owning storage representationtypeSubst<B>:Data<Elem =B> +DataOwned;/// Cheaply convert an owned [`Array<B, D>`] to a different owning array type, as dictated by `Subst`.////// The owning arrays implement `From`, which is the preferred method for changing the underlying storage./// This method (and trait) should be reserved for dealing with changing the element type.fnfrom_owned<B,D:Dimension>(self_:Array<B,D>) ->ArrayBase<Self::Subst<B>,D>;} |
If you can put |
Thank you for your patience. I have finally had a chance to work on this. Please let me know what you think.
|
@benkay86 now it's my turn to thank you - I had to get our BLAS CI/CD situation handled before I could go back to other PRs. I like all of your suggestions - let's leave the You've done tremendous work here. If you would like me to cover the last mile I'd be happy to make the necessary changes. |
Thank you for kind words@akern40! I am happy to make the requested changes unless there is some tight release deadline that I am getting in the way of. After having (yet more) time to think about this, I have two thoughts I want to run by you. Number OneIt turns out there are two roughly-equally-efficient ways to implement The other implementation is to call mapv_into() on the input data representation directly. Then it transmutes from Do you have a preference between these two implementation strategies? Number TwoAs a final point for thought, I wonder if this PR should be merged at all, or if we should mark the original issue#1280 as won't fix.@RReverser never did give us an example of where he thought returning an
In fact, theexisting implementation of let output =self.mapv_into(f).into_owned(); For an Pros of this PR: obviously the current situation is confusing to users, and this PR creates some syntactic/generic sugar to make Cons: Rust struggles with trait solving and generic return types in particular. Adding a generic return type and the |
Note this is abreaking change that will require users of
mapv_into_any()
to tell the compiler, by annotating the return type, whether the returned array should be anArray
or anArcArray
.