Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Resample RGB images in C++#29453

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
QuLogic wants to merge4 commits intomatplotlib:main
base:main
Choose a base branch
Loading
fromQuLogic:resample-rgb

Conversation

QuLogic
Copy link
Member

@QuLogicQuLogic commentedJan 11, 2025
edited
Loading

PR summary

Agg already has RGB resampling with output to RGBA builtin, so we just need to correctly wire up the corresponding templates. With this RGB resampling mode, we save the extra copy from RGB to RGBA in NumPy land that was required for the previous always-RGBA resampling.

With the example from#29434, this saves 800MB of peak memory usage,which actually works out totwo copies of 10000x10000x4 bytes; I'm not sure where the second copy comes from. I was thinking it was 4-byte RGBA, but it's actually 8-byte float64 input, so 1 copy removed as expected.

While this does pass tests, I think there may be some room for further improvement, as the input to the C++ code isA[..., :3], which should be a discontiguous view, but then the pybind11 code forces it to be contiguous, which would make another copy. There are some offset and step settings in Agg that I think we can use to avoid this copy as well, but it might require extra template expansions. I have implemented this to save another copy.

PR checklist

jkittner reacted with thumbs up emoji
Agg already has RGB resampling with output to RGBA builtin, so we justneed to correctly wire up the corresponding templates. With this RGBresampling mode, we save the extra copy from RGB to RGBA in NumPy landthat was required for the previous always-RGBA resampling.
In the case of RGBA, the RGB and A channels are resampled separately,but they are created as a view on the original to pass to the C++ code.The C++ code then copies it to a contiguous buffer, but Agg's RGBresampler supports manually stepping the RGB input by a custom stride.As this step is a template parameter, we can't handle any arbitrarayarray, but can special case steps of 3 or 4 units, which should coverthe common cases of RGB or RGBA-viewed-as-RGB input.
@tacaswelltacaswell added this to thev3.11.0 milestoneJan 17, 2025
@QuLogic
Copy link
MemberAuthor

Usingmemray on the example from#29434, we can find 3 major memory contributors:

  1. A (800MB) copy madeinImageBase.set_data; this one is expected as we don't want to use or modify external data.
  2. A (3.2GB) copy madeinColormap._get_rgba_and_mask; this is part of the colour mapping process and expected (though for the old interpolation stage, this originally happened after resampling, so was smaller.)
  3. A (3.2GB) copy madein_rgb_to_rgba in order to make an RGBA array for resampling in C++ withjust the RGB from point 2.

Usingpyinstrument, we can find that most processing time is taken up byAxesImage._make_image, specifically_to_rgba (41.5%),_rgb_to_rgba (40.5%), and_resample (16.4%).

Screenshot 2025-01-23 at 06-12-16 22 8s - issue29434 py - pyinstrument

The first commit here removes entry 3 by applying Agg templates for RGB, but we still end up allocating a similar size one in_resample (to make a contiguous array), but without the alpha channel, so only 2.4GB, meaning a drop of 800MB. Instrumentation doesn't show much change in runtime.

The second commit removes the need for that latter copy as well by applyingmore Agg templates for step size of 3 or 4. Strangely though, memray now attributesmore allocations toColorizer.to_rgba 4.7GiB vs 3.0 GiB before. Insrtumentation does show a good improvement in runtime:

Screenshot 2025-01-23 at 06-23-50 14 7s - issue29434 py - pyinstrument

I thinkmemray must only show the blocks that contribute to peak usage (which explains why the attribution changes around a bit after the commits). Reported peak memory usage shrank from 6.8 GiB to 5.6 GiB (17.6% decrease), and runtime (ofAxes._make_image specifically) from 19.146s to 12.467s (34.9% decrease).

There are a few other copies that I can find by inspection:

  1. The initial data is 10000x10000xfloat=800M bytes, though since it's passed toimshow directly, it gets immediately discarded.
  2. A copyof the input in_Colormap._get_rgba_and_mask; this is so that it can be modified (with a byteswap or scaling from float to int).
  3. Three masks for over/under/bad, but these are booleans, not float64.
  4. Acast fromfloat toint; this is used for the indexing into the lut.

Copies 1, 2, and 4 are 8 bytes, so 800M each, and copy 3 should be 3*byte = 300M. Many of these are replacing the previous version, or otherwise discarded, so I guess they don't count formemray.

I believe we can remove copies 2-4 with a port of_get_rgba_and_mask to C++, in order to directly output the final RGBA without any intermediaries. I didn't implement the alpha part yet, but a quick implementation shows this will cut down peak memory usage to 4.6 GiB (which is probably as low as we can go for original, RGBA, and resampled), and runtime of_get_rgba_and_mask itself from 9.757s to 1.822s.

I haven't committed that last thing, as I didn't finish alpha processing and wasn't sure if we wanted to move it to C++, but I think the numbers indicate this is a good idea.

@timhoffm
Copy link
Member

On a side note,Colormap._get_rgba_and_mask has the peculiar behavior that integers are used to directly index into the lut, whereas floats are scaled before lookup so that 0...1 covers the color range (#28198). We want to move away from this anyway, basically deprecate accepting ints here. But we have to decide whether we need this integer-lookup capability and if so, how to make a reasonable API for it. Thoughts welcome.

@anntzer
Copy link
Contributor

anntzer commentedMar 23, 2025
edited
Loading

I think(?)#29776 would essentially remove the need for this PR (as it removes the use of _rgb_to_rgba).

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers
No reviews
Assignees
No one assigned
Projects
None yet
Milestone
v3.11.0
Development

Successfully merging this pull request may close these issues.

4 participants
@QuLogic@timhoffm@anntzer@tacaswell

[8]ページ先頭

©2009-2025 Movatter.jp