Asking Pillow for an "adaptive palette" does not appear to guarantee that the chosen colours will be the same, even if asking for exactly the same number as exist in the image. Instead, create an explicit palette, and quantize using it.

Additionally, since now the palette may be smaller than 256 colours, Pillow may choose to encode the image data with fewer than 8 bits per component, so we need to properly reflect that in the decode parameters (this was already done for the image parameters).

The effect on test images withmany colours is small, with a maximum RMS of 1.024, but for images with few colours, the result can be completely wrong as in the reported#25806.

Since this bugginess requires a large image with few colours and relies on possibly-fixable caching in Pillow, I opted to just update the existing test images instead and we should be a little more careful in changing them in the future.

Fixes#20575
Fixes#25806

PR checklist

"closes #0000" is in the body of the PR description tolink the related issue
new and changed code istested
[n/a]Plotting related features are demonstrated in anexample
[n/a]New Features andAPI Changes are noted with adirective and release note
[n/a] Documentation complies withgeneral anddocstring guidelines

QuLogic added backend: pdf topic: images labels

May 5, 2023

QuLogic added this to thev3.7.2 milestone

May 5, 2023

ksunden previously approved these changes

May 6, 2023

View reviewed changes

QuLogic force-pushed thefix-index-pdf branch fromc6e84d4 toba490cbCompare

May 6, 2023 04:00

Copy link

Member

ksunden commentedMay 16, 2023

Looks like there is one additional failing test left unresolved:

lib/matplotlib/tests/test_image.py::test_figimage[pdf-False]

Copy link

MemberAuthor

QuLogic commentedMay 30, 2023

I can reproduce this on Ubuntu WSL, but it's for bothsuppressComposite. Confusingly, if I copy the result into the baseline,git shows no differences and the tests still fail. Not sure if there's a Ghostscript bug on Ubuntu.

Copy link

MemberAuthor

QuLogic commentedJun 6, 2023

Further testing shows that the results are in fact still wrong. Disabling the code for indexed image results in all related tests failing.

Using a fairly simple image of a red gradient with pure Pillow, and the attempted conversion in this PR:

importnumpyasnpfromPILimportImagex=np.zeros((256,1,3),dtype=np.uint8)x[:,0,0]=np.arange(256)orig=Image.fromarray(x)palette=Image.new('P', (1,1))palette.putpalette([compfor_,colorinorig.getcolors()forcompincolor])new=orig.quantize(dither=Image.Dither.NONE,palette=palette)print(*new.getdata())y=np.asarray(new.convert('RGB'))print(y[:,0,0])np.testing.assert_array_equal(x,y)

results in an assertion:

255 255 255 255 251 251 251 251 247 247 247 247 243 243 243 243 239 239 239 239 235 235 235 235 231 231 231 231 227 227 227 227 223 223 223 223 219 219 219 219 215 215 215 215 211 211 211 211 207 207 207 207 203 203 203 203 199 199 199 199 195 195 195 195 191 191 191 191 187 187 187 187 183 183 183 183 179 179 179 179 175 175 175 175 171 171 171 171 167 167 167 167 163 163 163 163 159 159 159 159 155 155 155 155 151 151 151 151 147 147 147 147 143 143 143 143 139 139 139 139 135 135 135 135 131 131 131 131 127 127 127 127 123 123 123 123 119 119 119 119 115 115 115 115 111 111 111 111 107 107 107 107 103 103 103 103 99 99 99 99 95 95 95 95 91 91 91 91 87 87 87 87 83 83 83 83 79 79 79 79 75 75 75 75 71 71 71 71 67 67 67 67 63 63 63 63 59 59 59 59 55 55 55 55 51 51 51 51 47 47 47 47 43 43 43 43 39 39 39 39 35 35 35 35 31 31 31 31 27 27 27 27 23 23 23 23 19 19 19 19 15 15 15 15 11 11 11 11 7 7 7 7 3 3 3 3[  0   0   0   0   4   4   4   4   8   8   8   8  12  12  12  12  16  16  16  16  20  20  20  20  24  24  24  24  28  28  28  28  32  32  32  32  36  36  36  36  40  40  40  40  44  44  44  44  48  48  48  48  52  52  52  52  56  56  56  56  60  60  60  60  64  64  64  64  68  68  68  68  72  72  72  72  76  76  76  76  80  80  80  80  84  84  84  84  88  88  88  88  92  92  92  92  96  96  96  96 100 100 100 100 104 104 104 104 108 108 108 108 112 112 112 112 116 116 116 116 120 120 120 120 124 124 124 124 128 128 128 128 132 132 132 132 136 136 136 136 140 140 140 140 144 144 144 144 148 148 148 148 152 152 152 152 156 156 156 156 160 160 160 160 164 164 164 164 168 168 168 168 172 172 172 172 176 176 176 176 180 180 180 180 184 184 184 184 188 188 188 188 192 192 192 192 196 196 196 196 200 200 200 200 204 204 204 204 208 208 208 208 212 212 212 212 216 216 216 216 220 220 220 220 224 224 224 224 228 228 228 228 232 232 232 232 236 236 236 236 240 240 240 240 244 244 244 244 248 248 248 248 252 252 252 252]Traceback (most recent call last):  File "/home/elliott/code/matplotlib/foo.py", line 15, in <module>    np.testing.assert_array_equal(orig, new.convert('RGB'))  File "/var/container/conda/envs/mpl39/lib/python3.9/site-packages/numpy/testing/_private/utils.py", line 983, in assert_array_equal    assert_array_compare(operator.__eq__, x, y, err_msg=err_msg,  File "/var/container/conda/envs/mpl39/lib/python3.9/contextlib.py", line 79, in inner    return func(*args, **kwds)  File "/var/container/conda/envs/mpl39/lib/python3.9/site-packages/numpy/testing/_private/utils.py", line 862, in assert_array_compare    raise AssertionError(msg)AssertionError:Arrays are not equalMismatched elements: 192 / 768 (25%)Max absolute difference: 3Max relative difference: 0.75 x: array([[[  0,   0,   0]],       [[  1,   0,   0]],... y: array([[[  0,   0,   0]],       [[  0,   0,   0]],...

Even though NumPy says 25% difference, that's for all channels, and only the 256 values from the red channel were unique, so 192 mismatches is really more like 75% difference. You can also see in the printed out indices or the re-converted RGB, that several values are skipped.

Probably due topython-pillow/Pillow#1852 (comment), even the explicit palette used in the example above / in this PR does not work. I think Pillow's quantization is not reliable if we want to have lossless palette conversion. Options seem to be either 1) implement some kind of lookup a) in NumPy or b) via our C extensions; 2) disable this optimization altogether. Whether we do one or the other might depend on the speed vs file size tradeoff.

QuLogic marked this pull request as draft

June 6, 2023 06:30

Copy link

Member

tacaswell commentedJun 6, 2023

should we push this to 3.8 and make a call one way or the other this week

Copy link

MemberAuthor

QuLogic commentedJun 6, 2023

We can discuss tomorrow or Thursday; I guess pushing out to 3.8 depends on how we decide to fix it.

Copy link

Member

jklymak commentedJun 6, 2023

It seems for short term we should revert the optimization? If a proper fix gets in for 3.8, that's great, if not at least we won't have something buggy?

We should perhaps still consider making interpolation_stage='rgba' the default to avoid the resampling code by default

ksunden dismissed theirstale review

June 7, 2023 02:51

evidence that more is needed since my review

Copy link

MemberAuthor

QuLogic commentedJun 8, 2023

We should perhaps still consider making interpolation_stage='rgba' the default to avoid the resampling code by default

Note that interpolation is a red herring; what really matters is how many colours are in the final image, and possibly its size. Interpolation just changes what the final size is.

QuLogic force-pushed thefix-index-pdf branch fromba490cb to5272b7dCompare

June 9, 2023 00:28

pdf: Use explicit palette when saving indexed images

96fad22

Asking Pillow for an "adaptive palette" does not appear to guaranteethat the chosen colours will be the same, even if asking for exactly thesame number as exist in the image. And asking Pillow to quantize with anexplicit palette does not work either, as Pillow uses a cache that trimsthe last two bits from the colour and never makes an explicit match.python-pillow/Pillow#1852 (comment)So instead, manually calculate the indexed image using some NumPytricks.Additionally, since now the palette may be smaller than 256 colours,Pillow may choose to encode the image data with fewer than 8 bits percomponent, so we need to properly reflect that in the decode parameters(this was already done for the image parameters).The effect on test images with _many_ colours is small, with a maximumRMS of 1.024, but for images with few colours, the result can becompletely wrong as in the reportedmatplotlib#25806.

QuLogic force-pushed thefix-index-pdf branch from5272b7d to96fad22Compare

June 10, 2023 00:56

Copy link

MemberAuthor

QuLogic commentedJun 10, 2023

I've now re-implemented the indexing in NumPy, instead of using Pillow. I hoped to add a test that could show simply how the original results were wrong, but in order to reproduce the error from#25806, it requires an image of >= 7073*7073 pixels, which makes a 12s-long test. As many other test images were also changed by fixing this bug, I did not see the benefit of adding such a test instead of updating existing results. Instead, I've added a test that confirms that my original plan of using Pillow'squantize-with-an-explicit-palette method does not work (which, if you revert the code back toba490cb, results in an image as in#25824 (comment)).

Additionally, to confirm that this code is now correct, I've commented out the block that does this optimization, run the test suite again, and thus checked against the normal RGB version. The only test that fails that istest_rotate_image as so:

And this case seems small enough that it's a PDF renderer issue.

QuLogic marked this pull request as ready for review

June 10, 2023 02:46

Copy link

MemberAuthor

QuLogic commentedJun 13, 2023

As an additional test, I wrote a small benchmark:

fromioimportBytesIOimportstructimporttimeitimportnumpyasnpfromPILimportImagedef_writePng(img):buffer=BytesIO()img.save(buffer,format="png")buffer.seek(8)png_data=b''bit_depth=palette=NonewhileTrue:length,type=struct.unpack(b'!L4s',buffer.read(8))iftypein [b'IHDR',b'PLTE',b'IDAT']:data=buffer.read(length)iflen(data)!=length:raiseRuntimeError("truncated data")iftype==b'IHDR':bit_depth=int(data[8])eliftype==b'PLTE':palette=dataeliftype==b'IDAT':png_data+=dataeliftype==b'IEND':breakelse:buffer.seek(length,1)buffer.seek(4,1)# skip CRCreturnpng_data,bit_depth,palettedefpillow(data):height,width,color_channels=data.shapeimg=Image.fromarray(data)img_colors=img.getcolors(maxcolors=256)ifcolor_channels==3andimg_colorsisnotNone:num_colors=len(img_colors)dither=getattr(Image,'Dither',Image).NONEpmode=getattr(Image,'Palette',Image).ADAPTIVEimg=img.convert(mode='P',dither=dither,palette=pmode,colors=num_colors        )png_data,bit_depth,palette=_writePng(img)palette=palette[:num_colors*3]defnumpy(data):height,width,color_channels=data.shapeimg=Image.fromarray(data)img_colors=img.getcolors(maxcolors=256)ifcolor_channels==3andimg_colorsisnotNone:num_colors=len(img_colors)palette=np.array([compfor_,colorinimg_colorsforcompincolor],dtype=np.uint8)palette24= ((palette[0::3].astype(np.uint32)<<16)|                     (palette[1::3].astype(np.uint32)<<8)|palette[2::3])rgb24= ((data[:, :,0].astype(np.uint32)<<16)|                 (data[:, :,1].astype(np.uint32)<<8)|data[:, :,2])indices=np.argsort(palette24).astype(np.uint8)rgb8=indices[np.searchsorted(palette24,rgb24,sorter=indices)]img=Image.fromarray(rgb8,mode='P')img.putpalette(palette)png_data,bit_depth,palette=_writePng(img)palette=palette[:num_colors*3]forsizein [2,10,100,1000,10000]:print(size)data=np.full((size,size,3),239,dtype=np.uint8)data[:size//2,size//2:, :]= (255,0,0)formodein ['pillow','numpy']:print(mode,timeit.timeit(f'{mode}(data)',globals=globals(),number=10))

This creates an 'image' of various sizes in grey with a red square in the top right quadrant, and then tries the old and new indexing. The results are:

2pillow 0.013012180104851723numpy 0.00157240196131169810pillow 0.0008533929940313101numpy 0.001532592112198472100pillow 0.006254185922443867numpy 0.00275150220841169361000pillow 0.5866641108877957numpy 0.1763394880108535310000pillow 57.763156425906345numpy 31.72443200601265

Surprisingly, in almost all cases, NumPy is faster than Pillow.

tacaswell approved these changes

Jun 13, 2023

View reviewed changes

ksunden merged commit475612e intomatplotlib:main

Jun 13, 2023

Copy link

lumberbot-appbot commentedJun 13, 2023

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

Checkout backport branch and update it.

git checkout v3.7.xgit pull

Cherry pick the first parent branch of the this PR on top of the older branch:

git cherry-pick -x -m1 475612ef50a7d4881eebaa0ebe5b410b7b202b28

You will likely have some merge/cherry-pick conflict here, fix them and commit:

git commit -am 'Backport PR #25824: pdf: Use explicit palette when saving indexed images'

Push to a named branch:

git push YOURFORK v3.7.x:auto-backport-of-pr-25824-on-v3.7.x

Create a PR against branch v3.7.x, I would have named this PR:

"Backport PR#25824 on branch v3.7.x (pdf: Use explicit palette when saving indexed images)"

And apply the correct labels and milestones.

Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon!

Remember to remove theStill Needs Manual Backport label once the PR gets merged.

If these instructions are inaccurate, feel free tosuggest an improvement.

lumberbot-appbot added the Still Needs Manual Backport label

Jun 13, 2023

QuLogic deleted the fix-index-pdf branch

June 13, 2023 21:36

Copy link

MemberAuthor

QuLogic commentedJun 13, 2023

Conflicts are:

Unmerged paths:  (use "git add/rm <file>..." as appropriate to mark resolution)        both modified:   lib/matplotlib/tests/baseline_images/test_backend_pdf/grayscale_alpha.pdf        both modified:   lib/matplotlib/tests/baseline_images/test_image/image_alpha.pdf        deleted by us:   lib/matplotlib/tests/baseline_images/test_image/image_placement.pdf        both modified:   lib/matplotlib/tests/baseline_images/test_image/image_shift.pdf        both modified:   lib/matplotlib/tests/baseline_images/test_image/imshow_masked_interpolation.pdf        both modified:   lib/matplotlib/tests/baseline_images/test_image/no_interpolation_origin.pdf        both modified:   lib/matplotlib/tests/baseline_images/test_image/rotate_image.pdf        both modified:   lib/matplotlib/tests/baseline_images/test_tightlayout/tight_layout5.pdf

and these all appear to be caused by#25704 which wasn't backported.

Copy link

MemberAuthor

QuLogic commentedJun 29, 2023

The backport should work now with#25704 backported, I think.

@meeseeksdev backport to v3.7.x

meeseeksmachine mentioned this pull request

Jun 29, 2023

Backport PR #25824 on branch v3.7.x (pdf: Use explicit palette when saving indexed images)#26215

Merged

meeseeksmachine pushed a commit to meeseeksmachine/matplotlib that referenced this pull request

Jun 29, 2023

Backport PRmatplotlib#25824: pdf: Use explicit palette when saving i…

399ebe1

…ndexed images

QuLogic removed the Still Needs Manual Backport label

Jun 29, 2023

QuLogic added a commit that referenced this pull request

Jun 29, 2023

Merge pull request#26215from meeseeksmachine/auto-backport-of-pr-25…

349849b

…824-on-v3.7.xBackport PR#25824 on branch v3.7.x (pdf: Use explicit palette when saving indexed images)

Labels

backend: pdf topic: images

4 participants

Movatterモバイル変換

Uh oh!

pdf: Use explicit palette when saving indexed images#25824

pdf: Use explicit palette when saving indexed images#25824

Uh oh!

Conversation

QuLogic commentedMay 5, 2023• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

PR summary

PR checklist

Uh oh!

ksunden commentedMay 16, 2023

Uh oh!

QuLogic commentedMay 30, 2023

Uh oh!

QuLogic commentedJun 6, 2023

Uh oh!

tacaswell commentedJun 6, 2023

Uh oh!

QuLogic commentedJun 6, 2023

Uh oh!

jklymak commentedJun 6, 2023

Uh oh!

QuLogic commentedJun 8, 2023

Uh oh!

QuLogic commentedJun 10, 2023

Uh oh!

QuLogic commentedJun 13, 2023

Uh oh!

lumberbot-appbot commentedJun 13, 2023

Uh oh!

QuLogic commentedJun 13, 2023

Uh oh!

QuLogic commentedJun 29, 2023

Uh oh!

Uh oh!

QuLogic commentedMay 5, 2023•
edited
Loading