Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork7.9k
Store vertices as (masked) arrays in Poly3DCollection#16688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
lib/mpl_toolkits/mplot3d/proj3d.py Outdated
Parameters | ||
---------- | ||
vecs : 3xN np.ndarray or np.ma.MaskedArray |
eric-wieserMar 11, 2020 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
It might make more sense for this to take and return(..., 3)
array, which would mean your caller does not need to reshape, and would make the below...
(doing it this way is consistent with how gufuncs behave in numpy, in that they care only about the last axis by default)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Note that those reshapes should not trigger any reallocations, so I'm not sure if that would actually help.We could push the transposes into this function, but we cannot make them go away without changing the conventions aroundrenderer.M
. Still, I don't think they cause any harm here.
eric-wieserMar 11, 2020 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This suggestion is more based off simplicity and convention than it is performance.
You're correct that reshapes typically are free, and.T
is always free.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I've tried your suggestions and it seems like while the original version takes ~72ms on average, the proposed changes increase the runtime to ~127ms. This is likely due to strided memory patters that start pop up everywhere (padding the array, normalizing the product, reductions over z axis). I do agree that the code is nicer though. Please do let me know how to proceed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks for profiling. Is that the runtime of the entire plot, or just this function?
eric-wieserMar 11, 2020 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Seems that something screwy is going on withmatmul(A, B)
vsmatmul(B.T, A.T).T
for 2dA
andB
.matmul
is supposed to know they are equivalent and pick the faster one, but it seems to be doing the same thing for both.
Edit:numpy/numpy#15742, it only works with an explicitout
argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
How large part of the slowdown is caused by matmul picking the wrong/having a slow kernel for this case then?
eric-wieserMar 11, 2020 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I can get a speedup with this mess:
def_proj_transform_vectors_new(vecs,M):is_masked=isinstance(vecs,np.ma.MaskedArray)ifis_masked:mask=vecs.mask# allocate with a convenient memory ordervecs_pad=np.moveaxis(np.empty((vecs.shape[-1]+1,)+vecs.shape[:-1]),0,-1)vecs_pad[..., :-1]=vecsvecs_pad[...,-1]=1product=np.empty_like(vecs_pad)np.matmul(vecs_pad,M.T,out=product)result=product[..., :3]/product[...,3:]ifis_masked:returnnp.ma.array(result,mask=mask)else:returnresultdefnew():# reshape is for speed only, it works correctly withoutpsegments=_proj_transform_vectors_new(segments.reshape(-1,3),M).reshape(segments.shape)epilogue(psegments)returnpsegments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This indeed yields a speedup, but a tiny one on my laptop:
New: 42.17msBaseline: 45.15msDiff: 2.98ms
I'm ok with going either way, just let me know 🤷♂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I think my request would perhaps be just to move the reshapes you have at the call site inside the function - that way you can keep the conventional(...,N)
API, but the implementation can be the easy-to-read and efficient code you have now.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
It would be great if we could merge this, because it is necessary to actually benefit from the patches I'm preparing now. |
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
872818b
tofbee126
Compare@eric-wieser@timhoffm I've addressed your requests, rebased on top of master and squashed the commits. |
lib/mpl_toolkits/mplot3d/art3d.py Outdated
face_z = self._zsortfunc(psegments[..., 2], axis=-1) | ||
if isinstance(face_z, np.ma.MaskedArray): | ||
# NOTE: Unpacking .data is safe here, because every face has to | ||
# contain a valid vertex. | ||
face_z = face_z.data | ||
face_order = np.argsort(face_z, axis=-1)[::-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Is the point here thatface_z
has a single element per face, and is therefore safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Exactly. My comment is based on the assumption that you don't have faces of length 0 (i.e. faces with all vertices masked). I think that's reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Alternatively, you could useface_z.filled(-np.inf)
, to be safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Now that I think about it, I don't think that it matters? Faces with no valid vertices should get all vertices coordinates filled with NaN values before drawing making them invisible, so it doesn't matter where do they end up in the face order.
lib/mpl_toolkits/mplot3d/art3d.py Outdated
# NOTE: Unpacking .data is safe here, because every face has to | ||
# contain a valid vertex. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Are you confident enough about this to add
assert not face_z.mask.any()
?
If not, then it would be better to handle it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
See my comment above. I would be comfortable with adding this assert if we're ok with making an assumption that you can't have faces of length 0. This certainly doesn't happen forplot_surface
and I don't see why would you do that in any other case too.
lib/mpl_toolkits/mplot3d/art3d.py Outdated
if self._codes3d is not None: | ||
codes = [self._codes3d[idx] for z, s, fc, ec, idx in z_segments_2d] | ||
segments_2d = [s.compressed().reshape(-1, 2) |
eric-wieserMar 22, 2020 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
compressed
is going to do something bad if somehow only one component was masked (eg, it wasnan
at construction).
Safer would be:
segments_2d= [s.compressed().reshape(-1,2) | |
segments_2d= [s.data[~s.mask.any(axis=-1)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
How would you like to interpret the data if one component was masked? That makes no sense because you get one point with only a single coordinate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Oh you mean that multiple similar errors like this could cancel out? That is true, but it's such a strong internal invariant violation that I'm not sure if what your solution does (i.e. just take masked data without asking too many questions) is better? Another alternative would be to dos.data[~s.mask[..., 0]]
, as the last dimension of mask is always supposed to be constant.
eric-wieserMar 22, 2020 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
You're right it makes no sense, but the way you have it written the behavior would be to interpret[[x1, y1], [x2, --], [x3, y3], [x4, --]]
as[[x1, y1], [x2, x3], [y3, x4]]
, which is going to be very confusing to the user as it combines adjacent points.
Usingany
as I do means that if the point is fully masked, its skipped, otherwise it propagates to the plotting anyway. Usingall
would also be fine, if you want partial masks to skip plotting too. What's not ok is corrupting adjacent points which aren't bad.
If you really wanted to play it safe you could assert thatnp.all(mask.any(axis=-1) == mask.all(axis-1)
, but that would cost you a bunch of time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Ok, I guess usingmask.all
might be more reasonable. Would that be ok with you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I've looked into it, and I'm not sure if we should be doing anything like that. Note that right below this line there is another comprehension that builds up a list of codes, each with length that is supposed to be equal to the length of this segment. So we can't just drop vertices arbitrarily and the only way to make this sane is to assume that the invariant I've mentioned holds. What do you think?
Uh oh!
There was an error while loading.Please reload this page.
Please fix the flake8 errors: |
41f36ac
to1e08084
Comparelib/mpl_toolkits/mplot3d/art3d.py Outdated
if self._codes3d is not None: | ||
codes = [self._codes3d[idx] for z, s, fc, ec, idx in z_segments_2d] | ||
segments_2d = [s.data[~s.mask.all(axis=-1)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
With my implementation in the constructor, you now have the invariant thatmask.all(axis==-1) == mask[:,0]
, which if you wanted to you could use for a very marginal speedup:
segments_2d= [s.data[~s.mask.all(axis=-1)] | |
# this invariant is ensured by the constructor | |
asserts.mask.strides[1]==0 | |
# which means that s.mask[:,i] is all the same location in memory, so there's no need to call `all`. | |
segments_2d= [s.data[~s.mask[:,0]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Or I could put in the assert and usecompressed()
, which avoids the unnecessary mask manipulation!
eric-wieserMar 22, 2020 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
compressed
is internally more complex that that operation anyway though. You're comparingdata[~mask[:,0]]
withdata.ravel()[mask.ravel()].reshape((-1, 2))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
In fact,mask.ravel()
is now a lot more expensive thanmask[:,0]
, as it has to create a copy to handle the 0 stride.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Ok, here's what I came up with:
@@ -667,6 +667,7 @@ class Poly3DCollection(PolyCollection): self._facecolors3d = self._facecolors psegments = proj3d._proj_transform_vectors(self._segments, renderer.M)+ is_masked = isinstance(psegments, np.ma.MaskedArray) num_faces = len(psegments) # This extra fuss is to re-order face / edge colors@@ -681,19 +682,24 @@ class Poly3DCollection(PolyCollection): cedge = cedge.repeat(num_faces, axis=0) face_z = self._zsortfunc(psegments[..., 2], axis=-1)- if isinstance(face_z, np.ma.MaskedArray):+ if is_masked: # NOTE: Unpacking .data is safe here, because every face has to # contain a valid vertex. face_z = face_z.data face_order = np.argsort(face_z, axis=-1)[::-1]+ segments_2d = psegments[face_order, :, :2] if self._codes3d is not None:- segments_2d = [s.data[~s.mask.all(axis=-1)]- for s in psegments[face_order, :, :2]]+ if is_masked:+ # NOTE: We cannot assert the same on segments_2d, as it is a+ # result of advanced indexing, so its mask is a newly+ # allocated tensor. However, both of those asserts are+ # equivalent.+ assert psegments.mask.strides[-1] == 0+ segments_2d = [s.compressed().reshape(-1, 2) for s in segments_2d] codes = [self._codes3d[idx] for idx in face_order] PolyCollection.set_verts_and_codes(self, segments_2d, codes) else:- segments_2d = psegments[face_order, :, :2] PolyCollection.set_verts(self, segments_2d, self._closed) self._facecolors2d = cface[face_order]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Those.ravel()
s are no-ops, becausesegments_2d
comes from advanced indexing anyway, so I still went withcompressed()
.
This removes a bunch of very expensive list comprehensions, droppingrendering times by half for some 3D surface plots.
1e08084
to8633bbd
CompareRestarting an earlier conversation:
Where exactly does this vectorized reduction occur? A option might be to store: self._segment_3d_data= ...# (n_faces, max_verts, 3)self._vertex_mask= ...# (n_faces, max_verts, 1) and reassemble them into a masked array only when needed. The assembly should be free, and via |
apaszke commentedMar 22, 2020 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
This is the reduction. I think that keeping the mask along with the array really does make the code simpler, and I don't think it should add a lot of performance penalties (it should be waaaay better than without this patch). I do agree however that it may be simpler to keep the invariants if we separate those. I'm ok with doing that if you think that's the way to go. No strong preference from my side anymore. |
eric-wieser commentedMar 22, 2020 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
If that's the only place, then you could consider ifisinstance(segments3d,np.ma.MaskedArray):self._segment_3d=segments3d.data# (n_faces, max_verts, 3)self._vertex_valid=~segments3d.mask.any(axis-1)# (n_faces, max_verts)elifisinstance(segments3d,np.ndarray):self._segment_3d=segments3d# (n_faces, max_verts, 3)self._vertex_valid=np.True_else:# existing code without the broadcast_to, and an inverted condition vertex_z=psegments[...,2]ifself._vertex_validisnotnp.True_:vertex_z=np.ma.array(psegments[...,2],mask=~self._vertex_valid)face_z=self._zsortfunc(vertex_z,axis=-1) # the if is an optimization to avoid an unnecessary copy, it would work unconditionallyifself._vertex_validisnotnp.True_:segments_2d= [s[self.self._vertex_valid, :]forsinsegments_2d] That second snippet becomes even simpler and 4x faster (on an array of 10000 elements) once the numpy version bumps: vertex_z=psegments[...,2]face_z=self._zsortfunc(vertex_z,axis=-1,where=self._vertex_valid) |
Yeah I think that's fair and makes things nicer in the end. I'll update the code tomorrow! |
My only question is: why use |
eric-wieser commentedMar 23, 2020 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I guess there's no reason to use |
@eric-wieser Turns out it's not as simple as we thought — see the last commit. There's one extra reduction over the Z axis at the very end of the function, and we always have to take care of masking before we call |
pzs = pfaces[..., 2] | ||
if needs_masking: | ||
pzs = np.ma.MaskedArray(pzs, mask=self._invalid_vertices) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Tempting to move this down to line 692, where it's used for the first time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I felt like it's logically connected topfaces
and it's used on lines 692, 725 and 729, so I didn't want to make the first use special in any way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I suggested this because it might be worth adding an optimization today along the lines of:
if numpy_version >= LooseVersion(1.17.0): # use the where argument, skip the masked array entirelyelse: ...
Happy to have that declared as out of scope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'm happy to do that, but I'm still a bit afraid that it will make the code even less readable. I'd have to insert the sameif
on lines 692 and 729. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Maybe try it and see if it actually helps with performance locally in any significant way? If not, then I think it's not worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I've tried this, and thewhere=
reductions are indeed faster!
In [1]:importnumpyasnpIn [2]:maskf=np.zeros((399*399,4),dtype=bool)In [3]:maskt=np.ones((399*399,4),dtype=bool)In [4]:x=np.ones((399*399,4),dtype=float)In [5]:mx=np.ma.MaskedArray(x,maskf)In [6]:%timeitnp.max(mx,axis=-1)10.3ms ±441µsperloop (mean ±std.dev.of7runs,100loopseach)In [7]:%timeitnp.max(x,axis=-1,where=maskt,initial=0)7.09ms ±139µsperloop (mean ±std.dev.of7runs,100loopseach)
On the other hand, it looks likenp.average
doesn't support thewhere=
argument, so I don't think we can do this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Oh, that's a good catch. You canalmost get away with passing the mask into theweights
argument of np.average, but that behaves poorly on NaNs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Hah, good point! And it looks like it still beats the masked array implementation, which is very surprising to me:
In [9]:%timeitnp.average(x,axis=-1,weights=maskt)8.69ms ±137µsperloop (mean ±std.dev.of7runs,100loopseach)In [10]:%timeitnp.average(mx,axis=-1)12.9ms ±370µsperloop (mean ±std.dev.of7runs,100loopseach)
But yes, there's an issue with invalid floating point values and there's no default argument we can pass in. I guess the latter problem could be fixed by using a wrapper likelambda *args, **kwargs: np.average(*args, weight=kwargs.pop('where', 1.), **kwargs)
in the dict defining zsort functions.
So what's the final decision?
Uh oh!
There was an error while loading.Please reload this page.
Co-Authored-By: Eric Wieser <wieser.eric@gmail.com>
for idx, ((xs, ys, zs), fc, ec) | ||
in enumerate(zip(xyzlist, cface, cedge))), | ||
key=lambda x: x[0], reverse=True) | ||
face_z = self._zsortfunc(pzs, axis=-1) |
eric-wieserMar 23, 2020 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Perhaps a marker for future cleanup would be handy here.
face_z=self._zsortfunc(pzs,axis=-1) | |
# TODO: When we drop numpy < 1.17, investigate using the `where` argument to reductions | |
face_z=self._zsortfunc(pzs,axis=-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Since matplotlib 3.5 we require numpy >= 1.17, so this can be updated immediately.
I was thinking of rebasing#11577, but it looks like this PR chain might obsolete it. What is the status on this PR, besides needing a rebase? |
@apaszke Gentle ping? |
scottshambaugh commentedJan 7, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
PR Summary
This removes a bunch of very expensive list comprehensions, dropping
rendering times by half for some 3D surface plots.
Runtime of the following code drops from 16s to 10s after#16675, while stacking this PR on top of#16675 lowers the time further to around 5s. Another 3s are now spent in
PolyCollection.set_verts
which I'm hoping to improve later.PR Checklist