Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

pdf: Use explicit palette when saving indexed images#25824

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
ksunden merged 1 commit intomatplotlib:mainfromQuLogic:fix-index-pdf
Jun 13, 2023

Conversation

QuLogic
Copy link
Member

@QuLogicQuLogic commentedMay 5, 2023
edited
Loading

PR summary

Asking Pillow for an "adaptive palette" does not appear to guarantee that the chosen colours will be the same, even if asking for exactly the same number as exist in the image. Instead, create an explicit palette, and quantize using it.

Additionally, since now the palette may be smaller than 256 colours, Pillow may choose to encode the image data with fewer than 8 bits per component, so we need to properly reflect that in the decode parameters (this was already done for the image parameters).

The effect on test images withmany colours is small, with a maximum RMS of 1.024, but for images with few colours, the result can be completely wrong as in the reported#25806.

Since this bugginess requires a large image with few colours and relies on possibly-fixable caching in Pillow, I opted to just update the existing test images instead and we should be a little more careful in changing them in the future.

Fixes#20575
Fixes#25806

PR checklist

ksunden
ksunden previously approved these changesMay 6, 2023
@ksunden
Copy link
Member

Looks like there is one additional failing test left unresolved:

lib/matplotlib/tests/test_image.py::test_figimage[pdf-False]

@QuLogic
Copy link
MemberAuthor

I can reproduce this on Ubuntu WSL, but it's for bothsuppressComposite. Confusingly, if I copy the result into the baseline,git shows no differences and the tests still fail. Not sure if there's a Ghostscript bug on Ubuntu.

@QuLogic
Copy link
MemberAuthor

Further testing shows that the results are in fact still wrong. Disabling the code for indexed image results in all related tests failing.

Using a fairly simple image of a red gradient with pure Pillow, and the attempted conversion in this PR:

importnumpyasnpfromPILimportImagex=np.zeros((256,1,3),dtype=np.uint8)x[:,0,0]=np.arange(256)orig=Image.fromarray(x)palette=Image.new('P', (1,1))palette.putpalette([compfor_,colorinorig.getcolors()forcompincolor])new=orig.quantize(dither=Image.Dither.NONE,palette=palette)print(*new.getdata())y=np.asarray(new.convert('RGB'))print(y[:,0,0])np.testing.assert_array_equal(x,y)

results in an assertion:

255 255 255 255 251 251 251 251 247 247 247 247 243 243 243 243 239 239 239 239 235 235 235 235 231 231 231 231 227 227 227 227 223 223 223 223 219 219 219 219 215 215 215 215 211 211 211 211 207 207 207 207 203 203 203 203 199 199 199 199 195 195 195 195 191 191 191 191 187 187 187 187 183 183 183 183 179 179 179 179 175 175 175 175 171 171 171 171 167 167 167 167 163 163 163 163 159 159 159 159 155 155 155 155 151 151 151 151 147 147 147 147 143 143 143 143 139 139 139 139 135 135 135 135 131 131 131 131 127 127 127 127 123 123 123 123 119 119 119 119 115 115 115 115 111 111 111 111 107 107 107 107 103 103 103 103 99 99 99 99 95 95 95 95 91 91 91 91 87 87 87 87 83 83 83 83 79 79 79 79 75 75 75 75 71 71 71 71 67 67 67 67 63 63 63 63 59 59 59 59 55 55 55 55 51 51 51 51 47 47 47 47 43 43 43 43 39 39 39 39 35 35 35 35 31 31 31 31 27 27 27 27 23 23 23 23 19 19 19 19 15 15 15 15 11 11 11 11 7 7 7 7 3 3 3 3[  0   0   0   0   4   4   4   4   8   8   8   8  12  12  12  12  16  16  16  16  20  20  20  20  24  24  24  24  28  28  28  28  32  32  32  32  36  36  36  36  40  40  40  40  44  44  44  44  48  48  48  48  52  52  52  52  56  56  56  56  60  60  60  60  64  64  64  64  68  68  68  68  72  72  72  72  76  76  76  76  80  80  80  80  84  84  84  84  88  88  88  88  92  92  92  92  96  96  96  96 100 100 100 100 104 104 104 104 108 108 108 108 112 112 112 112 116 116 116 116 120 120 120 120 124 124 124 124 128 128 128 128 132 132 132 132 136 136 136 136 140 140 140 140 144 144 144 144 148 148 148 148 152 152 152 152 156 156 156 156 160 160 160 160 164 164 164 164 168 168 168 168 172 172 172 172 176 176 176 176 180 180 180 180 184 184 184 184 188 188 188 188 192 192 192 192 196 196 196 196 200 200 200 200 204 204 204 204 208 208 208 208 212 212 212 212 216 216 216 216 220 220 220 220 224 224 224 224 228 228 228 228 232 232 232 232 236 236 236 236 240 240 240 240 244 244 244 244 248 248 248 248 252 252 252 252]Traceback (most recent call last):  File "/home/elliott/code/matplotlib/foo.py", line 15, in <module>    np.testing.assert_array_equal(orig, new.convert('RGB'))  File "/var/container/conda/envs/mpl39/lib/python3.9/site-packages/numpy/testing/_private/utils.py", line 983, in assert_array_equal    assert_array_compare(operator.__eq__, x, y, err_msg=err_msg,  File "/var/container/conda/envs/mpl39/lib/python3.9/contextlib.py", line 79, in inner    return func(*args, **kwds)  File "/var/container/conda/envs/mpl39/lib/python3.9/site-packages/numpy/testing/_private/utils.py", line 862, in assert_array_compare    raise AssertionError(msg)AssertionError:Arrays are not equalMismatched elements: 192 / 768 (25%)Max absolute difference: 3Max relative difference: 0.75 x: array([[[  0,   0,   0]],       [[  1,   0,   0]],... y: array([[[  0,   0,   0]],       [[  0,   0,   0]],...

Even though NumPy says 25% difference, that's for all channels, and only the 256 values from the red channel were unique, so 192 mismatches is really more like 75% difference. You can also see in the printed out indices or the re-converted RGB, that several values are skipped.

Probably due topython-pillow/Pillow#1852 (comment), even the explicit palette used in the example above / in this PR does not work. I think Pillow's quantization is not reliable if we want to have lossless palette conversion. Options seem to be either 1) implement some kind of lookup a) in NumPy or b) via our C extensions; 2) disable this optimization altogether. Whether we do one or the other might depend on the speed vs file size tradeoff.

@QuLogicQuLogic marked this pull request as draftJune 6, 2023 06:30
@tacaswell
Copy link
Member

should we push this to 3.8 and make a call one way or the other this week

@QuLogic
Copy link
MemberAuthor

We can discuss tomorrow or Thursday; I guess pushing out to 3.8 depends on how we decide to fix it.

@jklymak
Copy link
Member

It seems for short term we should revert the optimization? If a proper fix gets in for 3.8, that's great, if not at least we won't have something buggy?

We should perhaps still consider making interpolation_stage='rgba' the default to avoid the resampling code by default

@ksundenksunden dismissed theirstale reviewJune 7, 2023 02:51

evidence that more is needed since my review

@QuLogic
Copy link
MemberAuthor

We should perhaps still consider making interpolation_stage='rgba' the default to avoid the resampling code by default

Note that interpolation is a red herring; what really matters is how many colours are in the final image, and possibly its size. Interpolation just changes what the final size is.

Asking Pillow for an "adaptive palette" does not appear to guaranteethat the chosen colours will be the same, even if asking for exactly thesame number as exist in the image. And asking Pillow to quantize with anexplicit palette does not work either, as Pillow uses a cache that trimsthe last two bits from the colour and never makes an explicit match.python-pillow/Pillow#1852 (comment)So instead, manually calculate the indexed image using some NumPytricks.Additionally, since now the palette may be smaller than 256 colours,Pillow may choose to encode the image data with fewer than 8 bits percomponent, so we need to properly reflect that in the decode parameters(this was already done for the image parameters).The effect on test images with _many_ colours is small, with a maximumRMS of 1.024, but for images with few colours, the result can becompletely wrong as in the reportedmatplotlib#25806.
@QuLogic
Copy link
MemberAuthor

I've now re-implemented the indexing in NumPy, instead of using Pillow. I hoped to add a test that could show simply how the original results were wrong, but in order to reproduce the error from#25806, it requires an image of >= 7073*7073 pixels, which makes a 12s-long test. As many other test images were also changed by fixing this bug, I did not see the benefit of adding such a test instead of updating existing results. Instead, I've added a test that confirms that my original plan of using Pillow'squantize-with-an-explicit-palette method does not work (which, if you revert the code back toba490cb, results in an image as in#25824 (comment)).

Additionally, to confirm that this code is now correct, I've commented out the block that does this optimization, run the test suite again, and thus checked against the normal RGB version. The only test that fails that istest_rotate_image as so:
rotate_image_pdf-failed-diff
And this case seems small enough that it's a PDF renderer issue.

@QuLogicQuLogic marked this pull request as ready for reviewJune 10, 2023 02:46
@QuLogic
Copy link
MemberAuthor

As an additional test, I wrote a small benchmark:

fromioimportBytesIOimportstructimporttimeitimportnumpyasnpfromPILimportImagedef_writePng(img):buffer=BytesIO()img.save(buffer,format="png")buffer.seek(8)png_data=b''bit_depth=palette=NonewhileTrue:length,type=struct.unpack(b'!L4s',buffer.read(8))iftypein [b'IHDR',b'PLTE',b'IDAT']:data=buffer.read(length)iflen(data)!=length:raiseRuntimeError("truncated data")iftype==b'IHDR':bit_depth=int(data[8])eliftype==b'PLTE':palette=dataeliftype==b'IDAT':png_data+=dataeliftype==b'IEND':breakelse:buffer.seek(length,1)buffer.seek(4,1)# skip CRCreturnpng_data,bit_depth,palettedefpillow(data):height,width,color_channels=data.shapeimg=Image.fromarray(data)img_colors=img.getcolors(maxcolors=256)ifcolor_channels==3andimg_colorsisnotNone:num_colors=len(img_colors)dither=getattr(Image,'Dither',Image).NONEpmode=getattr(Image,'Palette',Image).ADAPTIVEimg=img.convert(mode='P',dither=dither,palette=pmode,colors=num_colors        )png_data,bit_depth,palette=_writePng(img)palette=palette[:num_colors*3]defnumpy(data):height,width,color_channels=data.shapeimg=Image.fromarray(data)img_colors=img.getcolors(maxcolors=256)ifcolor_channels==3andimg_colorsisnotNone:num_colors=len(img_colors)palette=np.array([compfor_,colorinimg_colorsforcompincolor],dtype=np.uint8)palette24= ((palette[0::3].astype(np.uint32)<<16)|                     (palette[1::3].astype(np.uint32)<<8)|palette[2::3])rgb24= ((data[:, :,0].astype(np.uint32)<<16)|                 (data[:, :,1].astype(np.uint32)<<8)|data[:, :,2])indices=np.argsort(palette24).astype(np.uint8)rgb8=indices[np.searchsorted(palette24,rgb24,sorter=indices)]img=Image.fromarray(rgb8,mode='P')img.putpalette(palette)png_data,bit_depth,palette=_writePng(img)palette=palette[:num_colors*3]forsizein [2,10,100,1000,10000]:print(size)data=np.full((size,size,3),239,dtype=np.uint8)data[:size//2,size//2:, :]= (255,0,0)formodein ['pillow','numpy']:print(mode,timeit.timeit(f'{mode}(data)',globals=globals(),number=10))

This creates an 'image' of various sizes in grey with a red square in the top right quadrant, and then tries the old and new indexing. The results are:

2pillow 0.013012180104851723numpy 0.00157240196131169810pillow 0.0008533929940313101numpy 0.001532592112198472100pillow 0.006254185922443867numpy 0.00275150220841169361000pillow 0.5866641108877957numpy 0.1763394880108535310000pillow 57.763156425906345numpy 31.72443200601265

Surprisingly, in almost all cases, NumPy is faster than Pillow.

@ksundenksunden merged commit475612e intomatplotlib:mainJun 13, 2023
@lumberbot-app
Copy link

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
git checkout v3.7.xgit pull
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
git cherry-pick -x -m1 475612ef50a7d4881eebaa0ebe5b410b7b202b28
  1. You will likely have some merge/cherry-pick conflict here, fix them and commit:
git commit -am 'Backport PR #25824: pdf: Use explicit palette when saving indexed images'
  1. Push to a named branch:
git push YOURFORK v3.7.x:auto-backport-of-pr-25824-on-v3.7.x
  1. Create a PR against branch v3.7.x, I would have named this PR:

"Backport PR#25824 on branch v3.7.x (pdf: Use explicit palette when saving indexed images)"

And apply the correct labels and milestones.

Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon!

Remember to remove theStill Needs Manual Backport label once the PR gets merged.

If these instructions are inaccurate, feel free tosuggest an improvement.

@QuLogic
Copy link
MemberAuthor

Conflicts are:

Unmerged paths:  (use "git add/rm <file>..." as appropriate to mark resolution)        both modified:   lib/matplotlib/tests/baseline_images/test_backend_pdf/grayscale_alpha.pdf        both modified:   lib/matplotlib/tests/baseline_images/test_image/image_alpha.pdf        deleted by us:   lib/matplotlib/tests/baseline_images/test_image/image_placement.pdf        both modified:   lib/matplotlib/tests/baseline_images/test_image/image_shift.pdf        both modified:   lib/matplotlib/tests/baseline_images/test_image/imshow_masked_interpolation.pdf        both modified:   lib/matplotlib/tests/baseline_images/test_image/no_interpolation_origin.pdf        both modified:   lib/matplotlib/tests/baseline_images/test_image/rotate_image.pdf        both modified:   lib/matplotlib/tests/baseline_images/test_tightlayout/tight_layout5.pdf

and these all appear to be caused by#25704 which wasn't backported.

@QuLogic
Copy link
MemberAuthor

The backport should work now with#25704 backported, I think.

@meeseeksdev backport to v3.7.x

QuLogic added a commit that referenced this pull requestJun 29, 2023
…824-on-v3.7.xBackport PR#25824 on branch v3.7.x (pdf: Use explicit palette when saving indexed images)
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@tacaswelltacaswelltacaswell approved these changes

@ksundenksundenksunden left review comments

Assignees
No one assigned
Projects
None yet
Milestone
v3.7.2
Development

Successfully merging this pull request may close these issues.

[Bug]: pdf export for large image sizes results in wrong colors cm.set_bad() not working for specific values of grayscale and dpi when saving as pdf
4 participants
@QuLogic@ksunden@tacaswell@jklymak

[8]ページ先頭

©2009-2025 Movatter.jp