Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Type-1 font subsetting#20716

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
jkseppan wants to merge6 commits intomatplotlib:main
base:main
Choose a base branch
Loading
fromjkseppan:type1-subset

Conversation

jkseppan
Copy link
Member

@jkseppanjkseppan commentedJul 22, 2021
edited
Loading

PR Summary

Type-1 subsetting

This reduces pdf file sizes when usetex is active, at the cost of
some complexity in the code. We implement a charstring bytecode
interpreter to keep track of subroutine calls in font programs.

Recommend merging to main to give people time to test this, not to
a 3.10 point release.

Give dviread.DviFont a fake filename attribute and a get_fontmap
method for character tracking.

Add type hints to the code this touches.

Closes#127.

PR Checklist

  • Has pytest style unit tests (andpytest passes).
  • IsFlake 8 compliant (runflake8 on changed files to check).
  • New features are documented, with examples if plot related.
  • Documentation is sphinx and numpydoc compliant (the docs shouldbuild without error).
  • Conforms to Matplotlib style conventions (installflake8-docstrings and runflake8 --docstring-convention=all).
  • New features have an entry indoc/users/next_whats_new/ (follow instructions in README.rst there).
  • API changes documented indoc/api/next_api_changes/ (follow instructions in README.rst there).

mpetroff reacted with hooray emoji
@anntzer
Copy link
Contributor

Is this ready for review, now that#20715 has been merged?

@jkseppan
Copy link
MemberAuthor

This is failing on Ubuntu 22.04 and Windows but passing on 24.04 and Mac. Here's one failing image (test_usetex_pdf.png, so converted from pdf to png on the test system).

test_usetex_pdf

This looks like the font is entirely broken. The expected image similarly converted looks like this:

test_usetex-expected_pdf

Strangely enough, the generated pdf file looks fine on my Mac.

@jkseppan
Copy link
MemberAuthor

I can repeat the error running on an Ubuntu 22.04 docker image:

GPL Ghostscript 9.55.0 (2021-09-27)Copyright (C) 2021 Artifex Software, Inc.  All rights reserved.This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:see the file COPYING for details.Processing pages 1 through 1.Page 1   **** Error: can't process embedded font stream,        attempting to load the font using its name.               Output may be incorrect.Querying operating system for font files...Substituting font Helvetica for MPLAAD+CMR17.Loading NimbusSans-Regular font from /usr/share/ghostscript/9.55.0/Resource/Font/NimbusSans-Regular... 4467044 2957798 6795968 5451925 4 done.   **** Error: can't process embedded font stream,        attempting to load the font using its name.               Output may be incorrect.Substituting font Helvetica for MPLAAC+CMR12.   **** Error: can't process embedded font stream,        attempting to load the font using its name.               Output may be incorrect.Substituting font Helvetica for MPLAAA+CMEX10.   **** Error: can't process embedded font stream,        attempting to load the font using its name.               Output may be incorrect.Substituting font Helvetica-Oblique for MPLAAB+CMMI12.Loading NimbusSans-Italic font from /usr/share/ghostscript/9.55.0/Resource/Font/NimbusSans-Italic... 4534308 3175939 7138400 5761600 4 done.

The subsetting is wrong in some way that breaks Ghostscript 9.55 but not the viewer in macOS or the newer Ghostscript in Ubuntu 24.04. (Ghostscript 9.56 has a completely rewritten PDF interpreter.)

@jkseppan
Copy link
MemberAuthor

I hope I found the culprit... I was writing an extra delimiter between the Subrs and the Charstrings when one was already there.

@jkseppanjkseppanforce-pushed thetype1-subset branch 2 times, most recently from9c5d971 to9a9dc05CompareMay 4, 2025 17:15
@jkseppanjkseppan marked this pull request as ready for reviewMay 4, 2025 17:16
@@ -35,10 +37,64 @@

frommatplotlib.cbookimport_format_approx
from .import_api
ifT.TYPE_CHECKING:
fromcollections.abcimportIterable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We tend not to have inline type hints (and I personally very much don't like them), but there are a few exceptions (e.g. _mathtext.py) so I guess it's up to you whether you want to leave them in.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Right – I saw that there are some files with type hints, and figured that the project might be in the process of adding them. I've found type hints pretty useful at work, but of course we should have a consistent style in the project. Has this been discussed in the past on the dev list or somewhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm probably among the most vocal opponents, but I'll defer to@QuLogic or@ksunden on this aspect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I gave my opinion on our gitter, but I'll copy the main comment here for the sake of consolidation/potential for finding it in the future:

I would say type hints are a net positive in my opinion, though I acknowledge that there are problems (perhaps especially in a case like ours where APIs were designed well before type hints).

We went with stub files on initial implementation specifically to minimize some risk, specifically since the stub files have are not even loaded at runtime, there was no chance of them interfering.

However, I do think inline reduces a level ongoing maintenance risk, in particular the chance of the two files getting out of sync. (We have tooling/CI in place to help catch such things, but don't think it fully removes the risk)

The other factor is that stub files allow us draw the line at public APIs, and not have to worry about typing our internal logic. (Which has positives and negatives, positives being largely catching additional problems by type checker, negative largely being the surface area that needs to be covered.)

So in all, I think my personal recommendation would be a relatively slow uptake, where we keep the stub files for most public facing things in the interim, do inline type hints for internal logic when it makes sense to do so (e.g. when doing so is actively helpful for some refactor/feature addition/etc) then once the majority of internal logic has inline hints, consider inlining the hints of the more public facing APIs.

So looking at this example in particular, I think this does fall within my recommendation there. I am not personally opposed to it, but also not hard pulling for it. I do question a bit whether to advocate for "do minimal as you are working and motivated" or "the unit for adding type hints should be one file at a time". Doing the latter would carry the advantage that you get a better sense of how complete the hints are, but the disadvantage of asking people to type hint 3-4x what they were otherwise looking at (in this case, for example), which also impacts the review-ability of the PR as there are lots of changes that are actually orthogonal.

For what its worth, this particular file already has one function which got an inline typehint added in#27796

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

and@tacaswell responded

It is worth re-opening the discussion of in-line type hints
the world seems to have stabilized and in other projects I have had type hints catch actual bugs before I ran the code (but I have also had to spend 15 minutes trying to sort out how to placate the type checker for code that clearly works a couple of times)

The type hints were useful for me while working on this, since VS Code pointed out some obvious mistakes in real time. But I agree that it is not ideal to leave the file only partially annotated, since it does not pass a full type check in its current state.

I could remove the extra type hints from this PR and make a separate PR to add hints to the whole file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Sure, let's just go for it. I'm not going to be able to resist this forever 🙄

@tacaswell
Copy link
Member

Will these new test be adjusted by#29816 ? If so we should sequence that one first.

@jkseppan
Copy link
MemberAuthor

Will these new test be adjusted by#29816 ? If so we should sequence that one first.

I don't think these depend on FreeType, since the usetex case uses TeX for layout and parses dvi files to determine the coordinates of glyphs.

@jkseppan
Copy link
MemberAuthor

The earlier CI error seems to have been caused by a newer mypy version detecting some more types in an unrelated file. The fix is also in PR#30119.

jkseppanand others added3 commitsMay 30, 2025 13:50
This reduces pdf file sizes when usetex is active, at the cost ofsome complexity in the code. We implement a charstring bytecodeinterpreter to keep track of subroutine calls in font programs.Give dviread.DviFont a fake filename attribute and a get_fontmapmethod for character tracking.In backend_pdf.py, refactor _get_subsetted_psname so it calls a method_get_subset_prefix, and reuse that to create tags for Type-1 fonts.Mark the methods static since they don't use anything from the instance.Recommend merging to main to give people time to test this, not toa 3.10 point release.Closesmatplotlib#127.Co-Authored-By: Elliott Sales de Andrade <quantum.analyst@gmail.com>
Co-authored-by: Elliott Sales de Andrade <quantum.analyst@gmail.com>
Co-authored-by: Elliott Sales de Andrade <quantum.analyst@gmail.com>
The old optimization where the same font file and descriptor could beshared by multiple differently-encoded fonts is wrong in the presence ofsubsetting, unless we also take the exact glyph subset into account.It is very unlikely for the exact same subset to be used by differentencodings, so just remove the optimization.Rearrange _embedTeXFont in neater blocks.
@QuLogic
Copy link
Member

It looks like you'll need to installheuristica andDejaVuSans texlive packages in.github/workflows/tests.yml to get those new tests running.

@jkseppan
Copy link
MemberAuthor

jkseppan commentedJun 5, 2025
edited
Loading

It looks like you'll need to installheuristica andDejaVuSans texlive packages in.github/workflows/tests.yml to get those new tests running.

Yes, but Debian (and hence Ubuntu) bundles them intotexlive-fonts-extra with a large number of other fonts. Not sure that slowing down all test runners is worth it, but perhaps we could do it in just one runner?

[Edit: Or we could see if we can install packages via tlmgr. Debian recommends against it but I don't think we need to worry about messing with package dependencies in a CI script.]

The DejaVu and Heuristica fonts are used by the type-1 fontsubsetting tests.Heuristica has a Cyrillic encoding and apparently cannot beloaded without installing texlive-lang-cyrillic.
@QuLogic
Copy link
Member

Here are the times for the "Install OS dependencies" step in the previous 3 commits:

Commit3.11 minver3.113.123.133.13 freethread3.12 arm
5006acc1m9s1m21s2m59s1m52s1m24s3m54s
546a7671m10s1m24s2m23s1m52s1m22s3m19s
5c2f6bd1m13s1m15s2m16s2m421m15s2m20s

And for the 2 last commits with the additional 1 or 2 packages:

Commit3.11 minver3.113.123.133.13 freethread3.12 arm
7741e9b2m31s2m36s5m3s2m50s2m10s3m25s
f411fb32m30s2m58s2m56s2m34s2m5s5m15s

It looks like it would add about ~1m per job?

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@QuLogicQuLogicQuLogic approved these changes

@anntzeranntzeranntzer approved these changes

@jklymakjklymakjklymak left review comments

@ksundenksundenksunden left review comments

Assignees
No one assigned
Projects
Status: Ready for Review
Milestone
v3.11.0
Development

Successfully merging this pull request may close these issues.

When text.usetex=True with pdf backend, full subset of latex fonts is embedded into pdf file
7 participants
@jkseppan@anntzer@tacaswell@QuLogic@jklymak@ksunden@oscargus

[8]ページ先頭

©2009-2025 Movatter.jp