NotificationsYou must be signed in to change notification settings
Fork7.9k
Star21.4k

TST: Calculate RMS and diff image in C++#29102

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

story645 merged 1 commit intomatplotlib:mainfromQuLogic:cpp-rms

Jun 19, 2025

Merged

TST: Calculate RMS and diff image in C++#29102

story645 merged 1 commit intomatplotlib:mainfromQuLogic:cpp-rms

Jun 19, 2025

+110 −9

Conversation

Copy link

Member

QuLogic commentedNov 8, 2024

PR summary

The current implementation is not slow, but uses a lot of memory per image.

Incompare_images, we have:

one actual and one expected image as uint8 (2×image)
both converted to int16 (though original is thrown away) (4×)

which adds up to 4× the image allocated in this function.

Then it callscalculate_rms, which has:

a difference between them as int16 (2×)
the difference cast to 64-bit float (8×)
the square of the difference as 64-bit float (though possibly the original difference was thrown away) (8×)

which at its peak has 16× the image allocated in parallel.

If the RMS is over the desired tolerance, thensave_diff_image is called, which:

loads the actual and expected imagesagain as uint8 (2× image)
converts both to 64-bit float (throwing away the original) (16×)
calculates the difference (8×)
calculates the absolute value (8×)
multiples that by 10 (in-place, so no allocation)
clips to 0-255 (8×)
casts to uint8 (1×)

which at peak uses 32× the image.

So at their peak,compare_images→calculate_rms will have 20× the image allocated, and thencompare_images→save_diff_image will have 36× the image allocated. This is generally not a problem, but on resource-constrained places like WASM, it can sometimes run out of memory just incalculate_rms.

This implementation in C++ always allocates the diff image, even when not needed, but doesn't have all the temporaries, so it's a maximum of 3× the image size (plus a few scalar temporaries).

PR checklist

[n/a] "closes #0000" is in the body of the PR description tolink the related issue
new and changed code istested
[n/a]Plotting related features are demonstrated in anexample
[n/a]New Features andAPI Changes are noted with adirective and release note
[n/a] Documentation complies withgeneral anddocstring guidelines

QuLogic added topic: testing Performance labels

Nov 8, 2024

github-actionsbot added the topic: images label

Nov 8, 2024

QuLogic mentioned this pull request

Nov 8, 2024

Add wasm CI#29093

Open

4 tasks

github-actionsbot added the status: needs rebase label

Jan 4, 2025

Copy link

MemberAuthor

QuLogic commentedJun 4, 2025

So I no longer have any memory-based skips on the PR adding WASM, but maybe we still want to do this to save memory in general?

Copy link

Member

oscargus commentedJun 5, 2025

This seems to make sense!

Should we also use this incompare_rms? Or deprecate that?

oscargus approved these changes

Jun 5, 2025

View reviewed changes

story645 reviewed

Jun 13, 2025

View reviewed changes

lib/matplotlib/testing/compare.py OutdatedShow resolvedHide resolved

src/_image_wrapper.cpp OutdatedShow resolvedHide resolved

src/_image_wrapper.cppShow resolvedHide resolved

QuLogic force-pushed thecpp-rms branch from62a96ff to3262fc8Compare

June 18, 2025 00:09

github-actionsbot removed the status: needs rebase label

Jun 18, 2025

Copy link

MemberAuthor

QuLogic commentedJun 18, 2025•
edited
Loading

Put together a small benchmark:

importtimeitimporttracemallocimportnumpyasnpfromPILimportImageforNin [100,555,1000,2000]:N=int(N)fornamein ['expected','actual']:image= (np.random.random((N,N,3))*255).astype(np.uint8)Image.fromarray(image).save(f'{name}{N}.png')delimagetracemalloc.start()timer=timeit.Timer(f'compare_images("expected{N}.png", "actual{N}.png", 0)',setup='from matplotlib.testing.compare import compare_images')runtime=timer.autorange()print(N,runtime[1]/runtime[0],tracemalloc.get_traced_memory()[1])tracemalloc.stop()

which prints runtime and peak memory up to 2000x2000. I think our largest test image is probably around 1800x900, and 555x555 is approximately equal to 640x480 pixels (the default figure size.)

Time-wise, this probably doesn't work out to a lot, maybe about 80% of before on the default figure size, and maybe 55-60% at the larger end:

But memory-wise, we're at 10% at the default figure size and even less for larger figures:

story645 reviewed

Jun 18, 2025

View reviewed changes

src/_image_wrapper.cppShow resolvedHide resolved

QuLogic force-pushed thecpp-rms branch from3262fc8 to80aa299Compare

June 18, 2025 20:23

TST: Calculate RMS and diff image in C++

b13e31a

The current implementation is not slow, but uses a lot of memory perimage.In `compare_images`, we have:- one actual and one expected image as uint8 (2×image)- both converted to int16 (though original is thrown away) (4×)which adds up to 4× the image allocated in this function.Then it calls `calculate_rms`, which has:- a difference between them as int16 (2×)- the difference cast to 64-bit float (8×)- the square of the difference as 64-bit float (though possibly the  original difference was thrown away) (8×)which at its peak has 16× the image allocated in parallel.If the RMS is over the desired tolerance, then `save_diff_image` iscalled, which:- loads the actual and expected images _again_ as uint8 (2× image)- converts both to 64-bit float (throwing away the original) (16×)- calculates the difference (8×)- calculates the absolute value (8×)- multiples that by 10 (in-place, so no allocation)- clips to 0-255 (8×)- casts to uint8 (1×)which at peak uses 32× the image.So at their peak, `compare_images`→`calculate_rms` will have 20× theimage allocated, and then `compare_images`→`save_diff_image` will have36× the image allocated. This is generally not a problem, but onresource-constrained places like WASM, it can sometimes run out ofmemory just in `calculate_rms`.This implementation in C++ always allocates the diff image, even whennot needed, but doesn't have all the temporaries, so it's a maximum of3× the image size (plus a few scalar temporaries).

QuLogic force-pushed thecpp-rms branch from80aa299 tob13e31aCompare

June 18, 2025 20:44

story645 approved these changes

Jun 18, 2025

View reviewed changes

Copy link

Member

story645 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Not merging b/c you keep pushing, but you're welcome to merge when you're done tweaking. The memory improvements look awesome!

Copy link

MemberAuthor

QuLogic commentedJun 19, 2025

That was just fixing stubtest; it should be good now.

story645 merged commite325459 intomatplotlib:main

Jun 19, 2025

39 of 41 checks passed

QuLogic deleted the cpp-rms branch

June 19, 2025 19:01

Labels

Performance topic: images topic: testing

3 participants

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

TST: Calculate RMS and diff image in C++#29102

TST: Calculate RMS and diff image in C++#29102