Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Make mapv_into_any() work for ArcArray, resolves #1280#1283

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
benkay86 wants to merge1 commit intorust-ndarray:master
base:master
Choose a base branch
Loading
frombenkay86:master

Conversation

benkay86
Copy link
Contributor

Note this is abreaking change that will require users ofmapv_into_any() to tell the compiler, by annotating the return type, whether the returned array should be anArray or anArcArray.

@benkay86benkay86force-pushed themaster branch 2 times, most recently frome8a73e6 to2ac3c9fCompareMay 4, 2023 17:07
Comment on lines 1002 to 1125
#[test]
fn mapv_into_any_diff_types() {
let a: Array<f64, _> = array![[1., 2., 3.], [4., 5., 6.]];
let a_even: Array<bool, _> = array![[false, true, false], [true, false, true]];
assert_eq!(a.mapv_into_any(|a| a.round() as i32 % 2 == 0), a_even);
let b: Array<_, _> = a.mapv_into_any(|a| a.round() as i32 % 2 == 0);
assert_eq!(b, a_even);
}

#[test]
fn mapv_into_any_arcarray_same_type() {
let a: ArcArray<f64, _> = array![[1., 2., 3.], [4., 5., 6.]].into_shared();
let a_plus_one: Array<f64, _> = array![[2., 3., 4.], [5., 6., 7.]];
let b: ArcArray<_, _> = a.mapv_into_any(|a| a + 1.);
assert_eq!(b, a_plus_one);
}

#[test]
fn mapv_into_any_arcarray_diff_types() {
let a: ArcArray<f64, _> = array![[1., 2., 3.], [4., 5., 6.]].into_shared();
let a_even: Array<bool, _> = array![[false, true, false], [true, false, true]];
let b: ArcArray<_, _> = a.mapv_into_any(|a| a.round() as i32 % 2 == 0);
assert_eq!(b, a_even);
}

#[test]
fn mapv_into_any_diff_outer_types() {
let a: Array<f64, _> = array![[1., 2., 3.], [4., 5., 6.]];
let a_plus_one: Array<f64, _> = array![[2., 3., 4.], [5., 6., 7.]];
let b: ArcArray<_, _> = a.mapv_into_any(|a| a + 1.);
assert_eq!(b, a_plus_one);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can you add a test here that looks intoArcArray -->Array? That functionality is the original behavior of the function and should remain (or else we'd have to mark as an even greater breaking change). Adding other various tests from the combinatorial set would also be good, e.g.,ArcArray<i32> -->Array<f32> or some such.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@akern40, it's great to see ndarray getting some love! Looking at this with fresh eyes after a year, I realized the implementation can be further simplified to avoid breaking changes. The implementation is similar to the original version of mapv_into_any(). The trait bounds have been rewritten to be generic over the input memory representation (e.g. OwnedRepr vs OwnedArcRepr), but the output is constrained to have the same memory representation as the input. This addresses the issue in#1280 without requiring any type hinting.

I have added some additional doctests and removed the unit test converting between Array <--> ArcArray.

bluss reacted with hooray emojiakern40 reacted with heart emoji
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@benkay86 this is awesome! I'm a big fan of the new implementation, I think it's clean. I'm going to close out the other threads in this review; I just had one more request about changing theassert todebug_assert (unless you can think of places where the types and traits are satisfied but theassert fails).

@benkay86benkay86force-pushed themaster branch 4 times, most recently from44deee5 tofcc9967CompareOctober 24, 2024 20:11
Copy link
Collaborator

@akern40akern40 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This looks great! I like that this approach is consistent with other places in the crate where we output an array of the same type, and the trait bounds appear to be simpler, too.

// have the same kind of memory representation (OwnedRepr vs
// OwnedArcRepr), then their memory representations should be the
// same type, e.g. OwnedRepr<A> == OwnedRepr<B>
assert!(core::any::TypeId::of::<S>() == core::any::TypeId::of::<<S as RawDataSubst<B>>::Output>());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can we change this to adebug_assert? My impression is that this checks a condition that wethink the type system should be ensuring, but we'd like the assert just to be sure; in other words, it's a little mini test case. Unless there are conditions in which the types and trait bounds are satisfied but thisassert is false?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

You are 100% correct! The type equality should always be satisfied -- the assertions are just a little sanity check around the unsafe code. We totally can change this to adebug_assert!(). The use ofdebug_assert!() appears to be consistent with the rest of the ndarray codebase.

I will point out that the compiler nominally optimizes away the assertion if it can prove the types are equal after monomorphization. For example, comment out line 6 ofthis example -- the assembly output does not change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Woah! Had no idea, but I suppose that makes sense - I do think it just expands into a conditional.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I went ahead and changes theassert!() to adebug_assert!() as we discussed. Is there anything else you would like to see changed for this PR to be merged?

@akern40
Copy link
Collaborator

@benkay86 I was doing some thinking about this method not just in the context of concrete types but also in the context of using it in a method. Let's take the following example that's a play on your tests: a function that usesmapv_into_any to compute an array modulo some value, which we'll also take as input. Under the current version ofndarray, that signature would look like

fnis_modulo<S,D>(arr:ArrayBase<S,D>,modulo:i32) ->Array<bool,D>

while under this new version the signature is

fnis_modulo<S,D>(arr:ArrayBase<S,D>,modulo:i32) ->ArrayBase<<SasRawDataSubst<bool>>::Output,D>

Now, the verbosity is one thing - sometimes we need verbose types, we shouldn't always be afraid of them. My bigger concern is that we've actually erased the trait bounds that we originally had onS: we (the humans) know that the output is: Data, but Rust doesn't know that. We're then left with an array that lacks any method that requires more thanS: RawData.

Sorry I don't have a solution to this, at least not yet, and I've got some other things I'm trying to pull together around the repo. Happy to keep the conversation up, though - maybe I'm missing something? Or if you want to bounce ideas / solutions around. Really appreciate your persistence on this PR.

@benkay86
Copy link
ContributorAuthor

I appreciate your help thinking about these trait bounds. The combination of fancy trait bounds + unsafe code can be difficult to reason about!

With regard to the verbosity in the return value of your function signature, there may be ways to modifyRawDataSubst or define other helper traits to make writing generic functions that usemapv_into_any() more ergonomic. A related problem you have stumbled upon is that while this PR is backwards-compatible with calls tomapv_into_any() using concrete types, it introduces new trait bounds and is thus a potentiallybreaking change for generic functions callingmapv_into_any() which would themselves need new trait bounds.

I don't think your second concern about "erasing" trait bounds is valid. (And if we could erase them... would that be a bad thing?) The trait bounds onmapv_into_any() requireS: DataMut which constrainsS to be anOwnedRepr,OwnedArcRepr,CowRepr, orViewRepr. An additional trait boundArrayBase<<S as RawDataSubst<B>>::Output, D>: From<Array<B,D>> requires that we can make the output type from anOwnedRepr, which effectively forbidsS: ViewRepr. So, for better or worse, we can only useOwnedRepr,OwnedArcRepr, orCowRepr.

I have added a test case withCowArray.


If you want to go in a totally different direction, I could try to rewritemapv_into_any() as a trait implemented only by someArrayBase, akin tondarray_linalg::eig::Eig. This trait could have an associated typeOutput that simplifies the return values for genetic methods callingmapv_into_any(). Wemight be able to write bespoke implementations forOwnedRepr,OwnedArcRepr, andCowRepr.

@akern40
Copy link
Collaborator

akern40 commentedOct 30, 2024
edited
Loading

Sorry I should've been a little clearer when I was talking about "erasing" trait bounds. I'm worried about theoutput, not the input.

So, for better or worse, we can only use OwnedRepr, OwnedArcRepr, or CowRepr

This is my point: we know that, but the compiler doesn't.

(EDITED, for a better example)

Let's say I usemapv_into_any with the new signature. Then the following code:

fnparity<S,D>(arr:ArrayBase<S,D>,modulo:i32) ->boolwhere --snip--// Trait bounds that enable `mapv_into_any` and modulo, but I'm too lazy to write right now{let output = arr.mapv_into_any(|a| a.round()asi32 %2 ==0);// I can only call a limited set of functions on `output`, those that only have the `RawData` bound.// So I can't map, can't sum, can't take the first or last, can't view, etc}

Idk, maybe I'm being ridiculous.mapv_into_any already has a frustrating'static requirement in several places, so is anyone really calling it in generic code? Maybe not. What do you make of this concern?

@benkay86
Copy link
ContributorAuthor

Ah, I see what you mean now. We could, without loss of generality for extant types in ndarray, add the following additional trait bounds tomapv_into_any(). Indeed, if we were going to define a helper trait, we would want to include these!

pubfnmapv_into_any<B,F>(self,mutf:F) ->ArrayBase<<SasRawDataSubst<B>>::Output,D>where// ...       <Sas ndarray::RawDataSubst<i32>>::Output:DataOwned +DataMut,{// ...}

However, it is not actually necessary to place this trait bound in the definition ofmapv_into_any(). It can be added to yourparity() function as needed. For a complete example, let us begin by defining an alternative toArrayBase::round(). (To your previous point, the required trait bounds are rather verbose.)

use num_traits;fnround<S,A,D>(arr:ArrayBase<S,D>) ->ArrayBase<<Sas ndarray::RawDataSubst<i32>>::Output,D>whereS: ndarray::DataMut<Elem =A> + ndarray::RawDataSubst<i32> +'static,A: num_traits::real::Real + num_traits::cast::AsPrimitive<i32> +'static,D:Dimension,ArrayBase<<Sas ndarray::RawDataSubst<i32>>::Output,D>:From<Array<i32,D>>,{    arr.mapv_into_any(|a| a.round().as_())}#[test]fntest_round(){let a:Array<f32,_> =array![1.1,2.6,3.0];let a_rounded:Array<i32,_> =array![1,3,3];let b =round(a);assert_eq!(b, a_rounded);}

Now let's try something like yourparity() function. We will have to add one additional trait bound to our definition (the last one in the where clause) to convince the compiler that we can callArrayBase::sum(). But it does work once we have all the right trait bounds!

fnround_sum<S,A,D>(arr:ArrayBase<S,D>) ->i32whereS: ndarray::DataMut<Elem =A> + ndarray::RawDataSubst<i32> +'static,A: num_traits::real::Real + num_traits::cast::AsPrimitive<i32> +'static,D:Dimension,ArrayBase<<Sas ndarray::RawDataSubst<i32>>::Output,D>:From<Array<i32,D>>,    <Sas ndarray::RawDataSubst<i32>>::Output: ndarray::Data,{let output = arr.mapv_into_any(|a| a.round().as_());    output.sum()}#[test]fntest_round_sum(){let a:Array<f32,_> =array![1.1,2.6,3.0];let b =round_sum(a);assert_eq!(b,7);}

As for the'static trait bounds, note that this means something different from static lifetimes and is not as limiting as you might think.
https://doc.rust-lang.org/rust-by-example/scope/lifetime/static_lifetime.html#trait-bound

akern40 reacted with heart emoji

@akern40
Copy link
Collaborator

It can be added to your parity() function as needed

image

Might be time for me to sleep! Really excellent analysis here. I'm in favor of adding that bound to the function, as that would provide clearer error messages to users who may otherwise be confused as to why they can't call additional functions.

I need to go to bed, but my last consideration now is the verbosity; let's put a bit of thought into whether there's an obvious path forward on lowering it. Happy to think about it this weekend a bit, or if you've got the time and inclination you can throw a design out.

(P.S. Noticed you're at WashU! I'd say "Go Bears" or something but god knows we weren't a big sports school when I was there, can't imagine that's changed. Hope the semester is going well!)

@benkay86
Copy link
ContributorAuthor

benkay86 commentedNov 1, 2024
edited
Loading

OK, I've been playing around with the ergonomics. Here's one approach using a helper trait with (stable) generalized associated types to cut down on the trait bounds. The helper trait is something like:

pubtraitMappableData:RawData +'static{typeSubst<B>:Data<Elem =B> +DataOwned;fnfrom_owned_repr<B,D:Dimension>(arr:Array<B,D>) ->ArrayBase<Self::Subst<B>,D>;}impl<A:'static>MappableDataforOwnedRepr<A>{typeSubst<B> =OwnedRepr<B>;fnfrom_owned_repr<B,D:Dimension>(arr:Array<B,D>) ->ArrayBase<Self::Subst<B>,D>{        arr}}impl<A:'static>MappableDataforOwnedArcRepr<A>{typeSubst<B> =OwnedArcRepr<B>;fnfrom_owned_repr<B,D:Dimension>(arr:Array<B,D>) ->ArrayBase<Self::Subst<B>,D>{        arr.into()}}

The implementation ofmapv_into_any() then boils down to:

impl<A,S,D>ArrayBase<S,D>whereS:DataMut<Elem =A> +MappableData,D:Dimension,A:Clone +'static,{pubfnmapv_into_any<B,F>(self,mutf:F) ->ArrayBase<<SasMappableData>::Subst<B>,D>whereF:FnMut(A) ->B,B:'static,{if core::any::TypeId::of::<A>() == core::any::TypeId::of::<B>(){let f = |a|{let b =f(a);unsafe{unlimited_transmute::<B,A>(b)}};let output =self.mapv_into(f);debug_assert!(core::any::TypeId::of::<S>() == core::any::TypeId::of::<<SasMappableData>::Subst<B>>());unsafe{unlimited_transmute::<ArrayBase<S,D>,ArrayBase<<SasMappableData>::Subst<B>,D>>(output)}}else{            <SasMappableData>::from_owned_repr::<B,D>(self.mapv(f))}}}

Now we can have more ergonomic calls. For concrete types:

let a:Array<f64,_> =array![[1.,2.,3.],[4.,5.,6.]];let a_even = a.mapv_into_any(|a| a.round()asi32 %2 ==0)

Function with generic element type but concrete data representation:

fnround<A,D>(arr:Array<A,D>) ->Array<i32,D>whereA: num_traits::real::Real + num_traits::cast::AsPrimitive<i32>,D:Dimension,{   arr.mapv_into_any(|a| a.round().as_())}let a:Array<f32,_> =array![1.1,2.6,3.0];let a_rounded =round(a);

Function generic over element and data representation:

fnround<A,S,D>(arr:ArrayBase<S,D>) ->ArrayBase<<SasMappableData>::Subst<i32>,D>whereS: ndarray::DataMut<Elem =A> +MappableData,A: num_traits::real::Real + num_traits::cast::AsPrimitive<i32>,D:Dimension,{    arr.mapv_into_any(|a| a.round().as_())}let a:Array<f32,_> =array![1.1,2.6,3.0];let a_rounded =round(a);

Generic function operating on the (generic) output ofmapv_into_any():

fnround_sum<S,A,D>(arr:ArrayBase<S,D>) ->i32whereS: ndarray::DataMut<Elem =A> +MappableData,A: num_traits::real::Real + num_traits::cast::AsPrimitive<i32>,D:Dimension,{let output = arr.mapv_into_any(|a| a.round().as_());    output.sum()}let a:Array<f32,_> =array![1.1,2.6,3.0];let a_sum =round_sum(a);

Happy to let these ideas percolate a bit more, but let me know what you think. If we go this route then we'll have to decide on what to call the helper traitMappableData and what file/module to declare it in.

(P.S. Noticed you're at WashU! I'd say "Go Bears" or something but god knows we weren't a big sports school when I was there, can't imagine that's changed. Hope the semester is going w

Ha, I am always running into WashU alumni! What program were you in here? I'm a physician/scientist faculty using Rust to power my neuroimaging research.

@akern40
Copy link
Collaborator

akern40 commentedNov 4, 2024
edited
Loading

I think this is a great design. I want to suggest a slight twist inspired by yours that I think balances between ergonomics for the user and maintenance / fragility for the library (and suggest an alternative name):DataSusbt. This would be a subtrait ofRawDataSubst and would do two things:

  1. Restrict theRawDataSubst associated type to beDataOwned
  2. Define afrom_owned function that can map from an array into the associated type ofRawDataSubst
pubtraitDataSubst<A>:RawDataSubst<A> +DataOwnedwhereSelf::Output:DataOwned{fnfrom_owned<D:Dimension>(self_:Array<A,D>) ->ArrayBase<Self::Output,D>;}

I like the nameDataSubst because it reveals both the relationship toRawDataSubst, upon which this depends. This would be implemented forOwnedRepr,OwnedArcRepr, andCowRepr.

Unfortunately, this isn'tquite as succinct for users as yours, but it has the advantage of not repeating the associated types within the library, which I like. The definition ofmapv_into_any becomes

pubfnmapv_into_any<B,F>(self,mutf:F) ->ArrayBase<<SasRawDataSubst<B>>::Output,D>whereS:DataMut +DataSubst<B,Output:DataOwned>,F:FnMut(A) ->B,A:Clone +'static,B:'static,{if core::any::TypeId::of::<A>() == core::any::TypeId::of::<B>(){// A and B are the same type.// Wrap f in a closure of type FnMut(A) -> A .let f = |a|{let b =f(a);// Safe because A and B are the same type.unsafe{unlimited_transmute::<B,A>(b)}};// Delegate to mapv_into() using the wrapped closure.// Convert output to a uniquely owned array of type Array<A, D>.let output =self.mapv_into(f).into_owned();// Change the return type from Array<A, D> to Array<B, D>.// Again, safe because A and B are the same type.unsafe{unlimited_transmute::<Array<A,D>,ArrayBase<<SasRawDataSubst<B>>::Output,D>>(output)}}else{// A and B are not the same type.// Fallback to mapv().S::from_owned(self.mapv(f))}}

Rust won't directly suggest the cleanDataSubst<B, Output: DataOwned> but we can put something about that in the documentation for one or both ofDataSubst andmapv_into_any.

Thoughts?

Ha, I am always running into WashU alumni! What program were you in here? I'm a physician/scientist faculty using Rust to power my neuroimaging research.

I did my computer science degrees there! Exciting to see this library being used for research-level neuroscience.

@akern40
Copy link
Collaborator

Oh and I don't think you need theS: 'static bound? It seems to be compiling for me without it

@benkay86
Copy link
ContributorAuthor

benkay86 commentedNov 4, 2024
edited
Loading

Oh and I don't think you need the S: 'static bound? It seems to be compiling for me without it

It's only needed if you want the extradebug_assert!() to work. I think we could just drop thedebug_asesrt!(), especially if there is some way to run the code through miri...

I think this is a great design. I want to suggest a slight twist inspired by yours that I think balances between ergonomics for the user and maintenance / fragility for the library (and suggest an alternative name): DataSusbt. This would be a subtrait of RawDataSubst...

I considered this possibility since the trait is, conceptually, a natural extension ofRawDataSubst. Your proposedDataSubst trait is parameterized by its element typeA in parallel construction to theRawDataSubst trait. As I think you've noticed, this makes it impossible to do whatMappableData lets you do with generalized associated types:

impl<A,S,D>ArrayBase<S,D>whereS:DataMut<Elem=A> +MappableData

Instead you must introduce the constraint at the definition ofmapv_into_any() where the typeB is known:

pubfnmapv_into_any<B,F>(self,mutf:F> ->ArrayBase<<SasRawDataSubst<B>>::Output,D>whereS:DataMut<Elem =A> +DataSubst<B,Output:DataOwned>

But now every function callingmapv_into_any() that wants to be generic overS must specify the sameS: DataMut<Elem = A> + DataSubst<B, Output: DataOwned> bound.☹️

Unfortunately, the GAT version ofMappableData can't be a subtrait of the non-GATRawDataSubst<B>. It's unfortunate that GATs were not available whenRawDataSubst was written. RewritingRawDataSubst now would be a very big breaking change. YourDataSubst<B> trait is certainly more elegant, but necessarily more verbose...

This would be implemented for OwnedRepr, OwnedArcRepr, and CowRepr.

I'm wondering ifmapv_into_any() needs to supportCowRepr? I've never usedCowArray, can you point me to any examples of how you would use one?

@akern40
Copy link
Collaborator

akern40 commentedNov 5, 2024
edited
Loading

It's only needed if you want the extra debug_assert!() to work. I think we could just drop the debug_asesrt!(), especially if there is some way to run the code through miri...

I recently got Miri support working as part of our CI/CD, so it actually will run Miri the next time you push!

It's unfortunate that GATs were not available when RawDataSubst was written

This is true, but unfortunately I think it's trickier than that: implementingRawDataSubst as a GAT forViewRepr requires that your genericB live at least as long as the accompanying typeA:

impl<'a,A:'a>RawDataSubstforViewRepr<&'aA>{typeOutput<B> =ViewRepr<&'aB>;// Problem occurs hereunsafefndata_subst<B>(self) ->Self::Output<B>{ViewRepr::new()}}

You can't writeOutput<B: 'a> because you don't have a lifetime'a to use when implementingRawDataSubst for the other types. And makingRawDataSubst generic on'a causes lifetimes to suddenly boil up your data traits. There might be a way to get that design right, but it's a bigger change than this method, as you pointed out.

YourDataSubst<B> trait is certainly more elegant, but necessarily more verbose...

Ya, like I said, I was trying to find a middle ground. It's still significantly more succinct than the implementation without the additional trait, but it isn't quite as terse as theMappableData trait. It's a trade-off I'm happy with, but it's still a trade-off.

I'm wondering if mapv_into_any() needs to support CowRepr? I've never used CowArray, can you point me to any examples of how you would use one?

No reason it shouldn't, especially if it means a quickimpl for theDataSubst trait. For an example, check outas_standard_layout, which returns aCowArray. This avoids a copy for arrays which are already in standard layout.

@benkay86
Copy link
ContributorAuthor

benkay86 commentedNov 5, 2024
edited
Loading

To makeMappableData work withCowRepr without having lifetimes "boil up your data traits," it seems like we can usehigher-rank trait bounds to get away with something like this. (I confess this is not something I am expert on.) Then we can just useMappableData in our impls without needing anyMappableData<'a>.

pubtraitMappableData:RawData{typeSubst<B>:Data<Elem =B> +DataOwnedwherefor<'b>B:'b;fnfrom_owned_repr<B,D:Dimension>(arr:Array<B,D>) ->ArrayBase<Self::Subst<B>,D>wherefor<'b>B:'b;}// omitting other impls for brevityimpl<'a,A>MappableDataforCowRepr<'a,A>{typeSubst<B> =CowRepr<'a,B>wherefor<'b>B:'b;fnfrom_owned_repr<B,D:Dimension>(arr:Array<B,D>) ->ArrayBase<Self::Subst<B>,D>wherefor<'b>B:'b{        arr.into()}}

EDIT

It looks like the HRBfor<'b> B: 'b implies'b: static. That's all well and good formapv_into_any() since we already need the static trait bound forTypeId to work. But it would not work well in general. You would indeed have the lifetime boiling up your data traits likeMappableData<'a>. (Forum link.)


To your broader points about balancing maintainability with ergonomics, my counterargument is:

  • Helper traits are not uncommon in the ndarray ecosystem -- consider ndarray-linalg.
  • If there are future GAT-powered versions ofRawDataSubst and your proposedDataSubst trait, thenMappableData could be made into a subtrait ofDataSubst without breaking anything.
  • Signaling to users thatMappableData is a helper formapv_into_any() rather than something more general would make it easier to modify in the future.

The final choice is yours. If you still prefer theDataSubst<A> route then I will do another push to modify the PR accordingly.

@akern40
Copy link
Collaborator

Thanks for giving me time to think about this! After a bit of experimenting this morning, I like your approach. I'd like to propose a few changes:

  1. Let's renameMappableData toDataMappable, to make it conform withDataOwned,DataMut, etc.
  2. Let's implementDataMappable forall of the storage types.Array,ArcArray, andCowArray will map to themselves, and the view types (both raw and not) will map toArray. This change is necessary because otherwisemaps_into_any breaks even more by ceasing to be callable on views.
  3. I'd suggest the following language for the docs ofDataMappable (which should live indata_traits.rs), but feel free to alter as you see fit:
/// An array representation trait for mapping to an owned array with a different element type.////// Functions such as [`mapv_into_any`](ArrayBase::mapv_into_any) that alter an array's/// underlying element type often want to preserve the storage type (e.g., `ArcArray`)/// of the original array. However, because Rust considers `OwnedRepr<A>` and `OwnedRepr<B>`/// to be completely different types, a trait must be used to indicate what the mapping is.////// This trait will map owning storage types to the element-swapped version of themselves;/// view types are mapped to `OwnedRepr`. The following table summarizes the mappings:////// | Original Storage Type   | Corresponding Array Type | Mapped Storage Type | Output of `from_owned` |/// | ----------------------- | ------------------------ | ------------------- | ---------------------- |/// | `OwnedRepr<A>`          | `Array<A, D>`            | `OwnedRepr<B>`      | `Array<B, D>`          |/// | `OwnedArcRepr<A>`       | `ArcArray<A, D>`         | `OwnedArcRepr<B>`   | `ArcArray<B, D>`       |/// | `CowRepr<'a, A>`        | `CowArray<'a, A, D>`     | `CowRepr<'a, B>`    | `CowArray<'a, B, D>`   |/// | `ViewRepr<&'a A>`       | `ArrayView<'a, A, D>`    | `OwnedRepr<B>`      | `Array<B, D>`          |/// | `RawViewRepr<*const A>` | `RawArrayView<A, D>`     | `OwnedRepr<B>`      | `Array<B, D>`          |pubtraitDataMappable:RawData{/// The element-swapped, owning storage representationtypeSubst<B>:Data<Elem =B> +DataOwned;/// Cheaply convert an owned [`Array<B, D>`] to a different owning array type, as dictated by `Subst`.////// The owning arrays implement `From`, which is the preferred method for changing the underlying storage./// This method (and trait) should be reserved for dealing with changing the element type.fnfrom_owned<B,D:Dimension>(self_:Array<B,D>) ->ArrayBase<Self::Subst<B>,D>;}

@akern40
Copy link
Collaborator

If you can putDataMappable indata_traits.rs, implement it as described in the table (where I've left of theMut versions of the types for brevity, but they should get implementations too), and run format, I would be thrilled to merge this and include it in the next release ofndarray.

@benkay86
Copy link
ContributorAuthor

Thank you for your patience. I have finally had a chance to work on this. Please let me know what you think.

  • I am not sure how to go about writing a sound implementation ofmapv_into_any() forRawViewRepr. If you want, I can still implementDataMappable forRawViewRepr.
  • There is no advantage to implementingmapv_into_any() forViewRepr (except for API consistency, which is kind of important). If you have anArrayView then it will be no more performant thanmapv(). If you have anArrayViewMut and you are mapping between two identical types thenmap_inplace() is what you want. I have noted this in the documentation. Furthermore, supporting this makes the implementation ofmapv_into_any more complicated because it permits returning a data representation that is different than the input representation (e.g.ViewRepr ->OwnedRepr). Considernot implementingMappableData forViewRepr to encourage users to do the right thing.
  • I am not sure how to auto-format my code correctly. Runningcargo fmt on the stable channel changes many lines of code that are not mine, so it would appear some other auto-format tool is being used by the project.

@akern40
Copy link
Collaborator

@benkay86 now it's my turn to thank you - I had to get our BLAS CI/CD situation handled before I could go back to other PRs.

I like all of your suggestions - let's leave theDataMappable implementation to only owned types. The formatter iscargo +nightly fmt.

You've done tremendous work here. If you would like me to cover the last mile I'd be happy to make the necessary changes.

@benkay86
Copy link
ContributorAuthor

Thank you for kind words@akern40! I am happy to make the requested changes unless there is some tight release deadline that I am getting in the way of. After having (yet more) time to think about this, I have two thoughts I want to run by you.

Number One

It turns out there are two roughly-equally-efficient ways to implementmapv_into_any(). The current version converts the input data representation toOwnedRepr, delegates tomapv_into(), and then converts back to the original data representation fromOwnedRepr. This is entirely safe, butmight involve a little bit of overhead when repackaging the pointer to the data. Then it transmutes from data typeA to data typeB, which is safe becauseA andB are the same type, therefore<S as DataMappable>::Subst<A> must be the same type as<S as DataMappable>::Subst<B>.

The other implementation is to call mapv_into() on the input data representation directly. Then it transmutes fromS to<S as DataMappable>::Subst<B>. This is actuallyunsound because whileA andB are the same type,S and<S as DataMappable>::Subst<_> might not be the same data representation. For example, if the input is aViewRepr then the output is anOwnedRepr and we can't just transmute between the two.To make the transmute sound, we should makeDataMappable unsafe to implement and document that the input data representation must always be the same as the output representation.

Do you have a preference between these two implementation strategies?

Number Two

As a final point for thought, I wonder if this PR should be merged at all, or if we should mark the original issue#1280 as won't fix.@RReverser never did give us an example of where he thought returning anArcArray would make his code more efficient. He wrote:

It would be great if, like mapv_into, it was generic over storage types and allowed to clone ArcArray in-place when no copy is necessary.

In fact, theexisting implementation ofmapv_into_any() already does this:

let output =self.mapv_into(f).into_owned();

For anArcArray this ends up callingthis implementation ofinto_owned(), which repackages the memory of the existing, uniquely-ownedArcArray in place. Performing a transmute might save a little bit of overhead here (if the compiler doesn't optimize it out), but there is no wholesale reallocation going on behind the scenes. It's asimilar story forCowArray.

Pros of this PR: obviously the current situation is confusing to users, and this PR creates some syntactic/generic sugar to makemapv_into_any() more ergonomic to use with shared data. If we don't merge this then, at the very least, we ought to document that callingmapv_into_any() onArcArray orCowArray will not unnecessarily clone the data when converting toArray.

Cons: Rust struggles with trait solving and generic return types in particular. Adding a generic return type and theDataMappable trait make the learning curve that much steeper for ndarray and authors of generic functions in particular.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@blussblussbluss left review comments

@adamreicholdadamreicholdadamreichold left review comments

@akern40akern40akern40 requested changes

Assignees
No one assigned
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

4 participants
@benkay86@akern40@bluss@adamreichold

[8]ページ先頭

©2009-2025 Movatter.jp