Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Reproducible PS/PDF output (master)#6597

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
QuLogic merged 31 commits intomatplotlib:masterfromunknown repositoryDec 9, 2016
Merged

Reproducible PS/PDF output (master)#6597

QuLogic merged 31 commits intomatplotlib:masterfromunknown repositoryDec 9, 2016

Conversation

ghost
Copy link

This is a rebase of#6595 to the master branch.

Several software packages use matplotlib in their building process (mainly to produce PS or PDF documents). To make their build reproducible, it would be great to make matplotlib output reproducible.

To allow reproducible PS and PDF output:

  • honour SOURCE_DATE_EPOCH for timestamps in PS and PDF files.
    Seehttps://reproducible-builds.org/specs/source-date-epoch/
  • get keys sorted so that hatch patterns, images and markers are included with
    a reproducible order in the PDF file. Another solution is to sortself.hatchPatterns inwriteHatches (and similar ordering inwriteImages andwriteMarkers), but this consumes more memory.

This patch has been submitted in debian bug#827361

See alsohttps://reproducible-builds.org/

* honour SOURCE_DATE_EPOCH for timestamps in PS and PDF files.  Seehttps://reproducible-builds.org/specs/source-date-epoch/* get keys sorted so that hatchPatterns, images and markers are included with  a reproducible order in the PDF file.Seehttps://reproducible-builds.org/
@ghostghost mentioned this pull requestJun 16, 2016
@tacaswelltacaswell added this to the2.1 (next point release) milestoneJun 17, 2016
@tacaswell
Copy link
Member

I think we are already sorting all of those keys on the way out so do not need the ordered dicts.

Can you add an entry inhttps://github.com/matplotlib/matplotlib/tree/master/doc/users/whats_new and a test (like the deterministic svg test) that this actually works?

attn@jkseppan

@jkseppan
Copy link
Member

I agree that a test is needed. I don't see the reason for the ordered dicts either, but there could be some subtlety that I'm missing. It would be neat to make the date settable by the user (via e.g. the PdfPages constructor) but of course not necessary for this particular use case.

@ghost
Copy link
Author

I added a test for reproducible PDF output (you're right: this is very important!).
The test shows that the ordered dicts are needed for markers and hatches, due to the dict iterations at the beginning ofwriteHatches andwriteMarkers, which are unreproducible with python3.
Even if the same iteration occurs inwriteImages, the tests passes without ordering_images. I think this is because the_images keys are integers.
I also added two orderings (related to fonts), that appeared to be essential thanks to the test.

@jkseppan
Copy link
Member

Oh, of course! I was thinking about the page stream, but it also matters in which order the objects are output at the top level of the pdf file. I suspect that the ordering is needed for_images too - maybe the dict implementation is a list for integer-keyed small dictionaries, or something.

@ghost
Copy link
Author

I would like to add tests for postscript too, but I'm afraid the code will be nearly the same as for PDF: do you think I should move the common code somewhere (maybe in a newtesting/reproducible.py file), or duplicate it intest_backend_ps.py andtest_backend_pdf.py?

@tacaswell
Copy link
Member

.SK...======================================================================FAIL: Test SOURCE_DATE_EPOCH support for PS output----------------------------------------------------------------------Traceback (most recent call last):  File "C:\Miniconda-x64\envs\test-environment\lib\site-packages\nose\case.py", line 197, in runTest    self.test(*self.arg)  File "c:\projects\matplotlib\lib\matplotlib\testing\decorators.py", line 152, in wrapped_callable    func(*args, **kwargs)  File "c:\projects\matplotlib\lib\matplotlib\tests\test_backend_ps.py", line 186, in test_source_date_epoch    _test_source_date_epoch("ps", b"%%CreationDate: Sat Jan  1 00:00:00 2000")  File "c:\projects\matplotlib\lib\matplotlib\testing\determinism.py", line 122, in _test_source_date_epoch    assert string in buffAssertionError--------------------------------------------------------------------

These failures on appveyor look real.

@tacaswell
Copy link
Member

My naive guess is that there is a unicode vs bytes issues

@ghost
Copy link
Author

These failures on appveyor look real.

Sure! I'm working on it but for now I can't reproduce the problem at home.

Alexis Bienvenüe added4 commitsJuly 8, 2016 12:16
…th, day of month and times.This way we don't mind if timestamps are written with leading 0 or space.
@ghost
Copy link
Author

I think this is better now.

  • For parallel tests, I think threads are used, and they share the same environment. So setting the SOURCE_DATE_EPOCH environment variable in a test is not good. So I used subprocesses instead.
  • The timestamp was 'Jan 1' or 'Jan 01' sometimes, so I used Dec 15 instead to solve this issue.

Appveyor is now reporting only aHTTPError: 500 Server Error. I think this is not related to Matplotlib code, but I don't know if I can restart the build.

@tacaswell
Copy link
Member

👍 from me, attn@jkseppan can you do a final review?


The ``SOURCE_DATE_EPOCH`` environment variable can now be used to set
the timestamps value in the PS and PDF outputs, which are then
reproducible.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We should list the known limitations, because when downstream users see this promise, they may start depending on it and run into the limitations later. At least usetex with the ps backend should be ruled out, based on comments in the tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Please add a link to the specification that describes the value of this environment variable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Other possible limitations include at least mathtext, ps.usedistiller, ps.fonttype, and pdf.fonttype. These are features or settings that I think could easily hide a source of nondeterminism and that don't seem to get exercised by the new test. (To get a more reliable list of relevant features than just my guesses, you'd need to run the test with coverage checking and see which parts of the backends get no coverage.)

To be clear, I don't mean we should delay merging until there are tests for everything, but we should be realistic in how we document this. I imagine the reproducible-builds community would appreciate the feature as it is now, with careful documentation of what exactly they can rely on.

@jkseppan
Copy link
Member

Commitc007f49 changes the tests to use a two-digit date in December instead of January 1st. I think the underlying problem should be fixed instead of avoiding it in the tests, since we don't want to tell users that theirSOURCE_DATE_EPOCH values must be two-digit days.

@ghost
Copy link
Author

I considered reproducibility this way: with the same tools (same versions of all what you installed on your computer), the same commands lead to the same results.
With this point of view, I accept that with some different versions/implementations of python, the date can be written different ways (leading space or 0 for example). The problem with the tests was that we face different implementations with different behaviors. We don't pretend that the output will be exactly the same with all future versions of all the tools, all implementations of python.
I also accept that ghostscript can change its way to format the timestamp in a future version, but I did not include a test with usetex relying to the current behavior of ghostscript. Maybe this is too prudent.
Let me try to be more explicit about that in the doc.

@Kojoley
Copy link
Member

I think the failure is due toghostscript (some versions now honorsSOURCE_DATE_EPOCH, but some don't).

The fail happens on the build where ghostscript is not installed, sobf7387e commit is not the right way to handle such situation.

@ghost
Copy link
Author

The[XPASS(strict)] This test needs a ghostscript installation fail happened because the test should have failed (tagged@needs_ghostscript) but did not fail, becauseusetex was not properly passed to_determinism_save beforeaf4213e, so that the test ran without using ghostscript.
Correcting this, I then saw that now thatusetex is properly passed, the test usingusetex failed because ofghostscript, which does not honourSOURCE_DATE_EPOCH in the installed version. So I appliedbf7387e so that we can see if the test pass (goodghostscript version) or not, but without failing.

@tacaswell
Copy link
Member

'power cycled to restart CI (appveyor failures looked due to qt-related packaging issues).

@tacaswelltacaswell changed the titleReproducible PS/PDF output (master)[MRG+1] Reproducible PS/PDF output (master)Nov 1, 2016
Copy link
Member

@tacaswelltacaswell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Resuse UTC timezone from dates.py

@@ -135,6 +136,20 @@ def _string_escape(match):
assert False


# tzinfo class for UTC
class UTCtimezone(tzinfo):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We already have this in mpl/dates.py

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes! Sorry I missed that.

Copy link
Member

@tacaswelltacaswell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Resuse UTC timezone from dates.py

Copy link
Member

@tacaswelltacaswell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Resuse UTC timezone from dates.py

@tacaswelltacaswell changed the title[MRG+1] Reproducible PS/PDF output (master)[MRG] Reproducible PS/PDF output (master)Nov 1, 2016
@tacaswell
Copy link
Member

Other than one minor change, this looks good to go!

Don't worry about the coverall failure, that is un-related.

Sorry this has dragged on so long.

@QuLogic
Copy link
Member

The coveralls drop is probably due to a change in master, and not this PR; a rebase to latest master might help that, but is not strictly necessary.

Copy link
Member

@QuLogicQuLogic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

A couple of minor things.

------------------------------

The ``SOURCE_DATE_EPOCH`` environment variable can now be used to set
the timestamps value in the PS and PDF outputs. See
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

timestamp

default value is "mhi", so that the test includes all these objects.
format : str
format string. The default value is "pdf".
uid : str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Not seeing this being used for anything?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Theuid option is used intest_determinism_all_tex fromlib/matplotlib/tests/test_backend_ps.py

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

But it was only useful when files were used to store the figures. I will remove that.

format string, such as "pdf".
string : str
timestamp string for 2000-01-01 00:00 UTC.
keyword : str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It appears to bebytes, notstr.

@ghost
Copy link
Author

@QuLogic: thanks a lot for your review. I corrected those points.

@QuLogic
Copy link
Member

A rebase might fix coveralls (as I'm pretty sure it was a change in master that dropped coverage), but it's not strictly needed. I guess we aren't really waiting on anything else here.

@QuLogicQuLogic changed the title[MRG] Reproducible PS/PDF output (master)[MRG+1] Reproducible PS/PDF output (master)Dec 7, 2016
@QuLogicQuLogic merged commiteadafc6 intomatplotlib:masterDec 9, 2016
@QuLogicQuLogic changed the title[MRG+1] Reproducible PS/PDF output (master)Reproducible PS/PDF output (master)Dec 9, 2016
@ghost
Copy link
Author

Thanks to you all for your kind support!

@tacaswell
Copy link
Member

@JojoBoulix Thanks for taking care of this! Sorry it was such a protracted process.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers
No reviews
Assignees
No one assigned
Projects
None yet
Milestone
v2.1
Development

Successfully merging this pull request may close these issues.

7 participants
@tacaswell@jkseppan@QuLogic@dopplershift@Kojoley@NelleV@mdboom

[8]ページ先頭

©2009-2025 Movatter.jp