This reduces pdf file sizes when usetex is active, at the cost of
some complexity in the code. We implement a charstring bytecode
interpreter to keep track of subroutine calls in font programs.

Recommend merging to main to give people time to test this, not to
a 3.10 point release.

Give dviread.DviFont a fake filename attribute and a get_fontmap
method for character tracking.

Add type hints to the code this touches.

Closes#127.

PR Checklist

Has pytest style unit tests (andpytest passes).
IsFlake 8 compliant (runflake8 on changed files to check).
New features are documented, with examples if plot related.
Documentation is sphinx and numpydoc compliant (the docs shouldbuild without error).
Conforms to Matplotlib style conventions (installflake8-docstrings and runflake8 --docstring-convention=all).
New features have an entry indoc/users/next_whats_new/ (follow instructions in README.rst there).
API changes documented indoc/api/next_api_changes/ (follow instructions in README.rst there).

jkseppan added backend: pdf topic: text/fonts topic: text/usetex labels

Jul 22, 2021

jkseppan force-pushed thetype1-subset branch fromb66579d to6546417Compare

July 22, 2021 14:54

jkseppan added status: waiting for other PR status: work in progress labels

Jul 22, 2021

jkseppan force-pushed thetype1-subset branch from6546417 to3aaf4c9Compare

July 22, 2021 16:20

jkseppan force-pushed thetype1-subset branch 2 times, most recently fromf6861ad tod8ae364Compare

August 30, 2021 08:00

Copy link

Contributor

anntzer commentedOct 5, 2021

Is this ready for review, now that#20715 has been merged?

oscargus added status: needs rebase and removed status: waiting for other PR labels

Apr 14, 2022

jkseppan force-pushed thetype1-subset branch fromd8ae364 to4dcd073Compare

May 4, 2025 11:48

github-actionsbot removed status: needs rebase topic: text/fonts labels

May 4, 2025

jkseppan added the topic: text/fonts label

May 4, 2025

Copy link

MemberAuthor

jkseppan commentedMay 4, 2025

This is failing on Ubuntu 22.04 and Windows but passing on 24.04 and Mac. Here's one failing image (test_usetex_pdf.png, so converted from pdf to png on the test system).

This looks like the font is entirely broken. The expected image similarly converted looks like this:

Strangely enough, the generated pdf file looks fine on my Mac.

Copy link

MemberAuthor

jkseppan commentedMay 4, 2025

I can repeat the error running on an Ubuntu 22.04 docker image:

GPL Ghostscript 9.55.0 (2021-09-27)Copyright (C) 2021 Artifex Software, Inc.  All rights reserved.This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:see the file COPYING for details.Processing pages 1 through 1.Page 1   **** Error: can't process embedded font stream,        attempting to load the font using its name.               Output may be incorrect.Querying operating system for font files...Substituting font Helvetica for MPLAAD+CMR17.Loading NimbusSans-Regular font from /usr/share/ghostscript/9.55.0/Resource/Font/NimbusSans-Regular... 4467044 2957798 6795968 5451925 4 done.   **** Error: can't process embedded font stream,        attempting to load the font using its name.               Output may be incorrect.Substituting font Helvetica for MPLAAC+CMR12.   **** Error: can't process embedded font stream,        attempting to load the font using its name.               Output may be incorrect.Substituting font Helvetica for MPLAAA+CMEX10.   **** Error: can't process embedded font stream,        attempting to load the font using its name.               Output may be incorrect.Substituting font Helvetica-Oblique for MPLAAB+CMMI12.Loading NimbusSans-Italic font from /usr/share/ghostscript/9.55.0/Resource/Font/NimbusSans-Italic... 4534308 3175939 7138400 5761600 4 done.

The subsetting is wrong in some way that breaks Ghostscript 9.55 but not the viewer in macOS or the newer Ghostscript in Ubuntu 24.04. (Ghostscript 9.56 has a completely rewritten PDF interpreter.)

jkseppan force-pushed thetype1-subset branch from4dcd073 to9c55b18Compare

May 4, 2025 16:25

github-actionsbot removed the topic: text/fonts label

May 4, 2025

Copy link

MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We tend not to have inline type hints (and I personally very much don't like them), but there are a few exceptions (e.g. _mathtext.py) so I guess it's up to you whether you want to leave them in.

Copy link

MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Right – I saw that there are some files with type hints, and figured that the project might be in the process of adding them. I've found type hints pretty useful at work, but of course we should have a consistent style in the project. Has this been discussed in the past on the dev list or somewhere?

Copy link

Contributor

anntzerMay 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm probably among the most vocal opponents, but I'll defer to@QuLogic or@ksunden on this aspect.

Copy link

Member

ksundenMay 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I gave my opinion on our gitter, but I'll copy the main comment here for the sake of consolidation/potential for finding it in the future:

I would say type hints are a net positive in my opinion, though I acknowledge that there are problems (perhaps especially in a case like ours where APIs were designed well before type hints).
We went with stub files on initial implementation specifically to minimize some risk, specifically since the stub files have are not even loaded at runtime, there was no chance of them interfering.
However, I do think inline reduces a level ongoing maintenance risk, in particular the chance of the two files getting out of sync. (We have tooling/CI in place to help catch such things, but don't think it fully removes the risk)
The other factor is that stub files allow us draw the line at public APIs, and not have to worry about typing our internal logic. (Which has positives and negatives, positives being largely catching additional problems by type checker, negative largely being the surface area that needs to be covered.)
So in all, I think my personal recommendation would be a relatively slow uptake, where we keep the stub files for most public facing things in the interim, do inline type hints for internal logic when it makes sense to do so (e.g. when doing so is actively helpful for some refactor/feature addition/etc) then once the majority of internal logic has inline hints, consider inlining the hints of the more public facing APIs.

So looking at this example in particular, I think this does fall within my recommendation there. I am not personally opposed to it, but also not hard pulling for it. I do question a bit whether to advocate for "do minimal as you are working and motivated" or "the unit for adding type hints should be one file at a time". Doing the latter would carry the advantage that you get a better sense of how complete the hints are, but the disadvantage of asking people to type hint 3-4x what they were otherwise looking at (in this case, for example), which also impacts the review-ability of the PR as there are lots of changes that are actually orthogonal.

For what its worth, this particular file already has one function which got an inline typehint added in#27796

Copy link

MemberAuthor

jkseppanMay 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

and@tacaswell responded

It is worth re-opening the discussion of in-line type hints
the world seems to have stabilized and in other projects I have had type hints catch actual bugs before I ran the code (but I have also had to spend 15 minutes trying to sort out how to placate the type checker for code that clearly works a couple of times)

The type hints were useful for me while working on this, since VS Code pointed out some obvious mistakes in real time. But I agree that it is not ideal to leave the file only partially annotated, since it does not pass a full type check in its current state.

I could remove the extra type hints from this PR and make a separate PR to add hints to the whole file?

Copy link

Contributor

anntzerMay 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Sure, let's just go for it. I'm not going to be able to resist this forever 🙄

anntzer reviewed

May 6, 2025

View reviewed changes

lib/matplotlib/_type1font.pyShow resolvedHide resolved

QuLogic reviewed

May 9, 2025

View reviewed changes

doc/api/next_api_changes/behavior/20716-JKS.rst OutdatedShow resolvedHide resolved

doc/users/next_whats_new/type1_subset.rst Outdated

Comment on lines 1 to 2

		Type 1 fonts are now subsetted in PDF output
		--------------------------------------------

Copy link

Member

QuLogicMay 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change

	Type 1 fonts are nowsubsetted in PDF output
	--------------------------------------------
	Type 1 fonts are nowsubset in PDF output
	-----------------------------------------

Copy link

MemberAuthor

jkseppanMay 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think I disagree here... the English verb "set" is irregular in that way, but if you search for "subsetted" in the context of fonts, it seems to be fairly common, including in fonttools and various Adobe forums. See also the accepted answer tothis question and possibly the discussion offlied out in Steven Pinker'sWords and Rules.

Copy link

Member

jklymakMay 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Maybe just be a bit more verbose?

Suggested change

	Type 1 fonts are now subsetted in PDF output
	--------------------------------------------
	PDFs embed just the subset of Type 1 glyphs that are used
	-----------------------------------------------------------

Copy link

Member

QuLogicMay 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't disagree that this is the correct conjugation in the past tense, rather that these sentences are not in the past tense. It is stating what is and in the (foreseeable) future shall occur.

Copy link

MemberAuthor

jkseppanMay 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I reworded the whole paragraph to be hopefully more understandable on its own.

doc/users/next_whats_new/type1_subset.rst Outdated


		When using the usetex feature with the PDF backend, Type 1 fonts are embedded
		in the PDF output. These fonts used to be embedded in full, but they are now
		subsetted to only include the glyphs that are actually used in the figure.

Copy link

Member

QuLogicMay 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change

	subsetted to only include the glyphs that are actually used in the figure.
	subset to only include the glyphs that are actually used in the figure.

lib/matplotlib/_type1font.py OutdatedShow resolvedHide resolved

lib/matplotlib/backends/backend_pdf.py OutdatedShow resolvedHide resolved

lib/matplotlib/tests/test_usetex.py OutdatedShow resolvedHide resolved

Copy link

MemberAuthor

jkseppan commentedMay 11, 2025•
edited
Loading

I was trying to improve test coverage and discovered some cases where this implementation breaks the output. Math examples seem to work fine, but there are some text fonts that use less common features that I'm clearly not handling right. Here are two cases:

@image_comparison(["subsetting-heuristica.pdf"])deftest_subsetting_heuristica():# Heuristica uses the callothersubr operator for some glyphsmpl.rcParams['text.latex.preamble']='\n'.join((r'\usepackage{heuristica}',r'\usepackage[T1]{fontenc}',r'\usepackage[utf8]{inputenc}'    ))fig,ax=plt.subplots()ax.text(0.1,0.1,r"BHTem",usetex=True,fontsize=50)ax.text(0.1,0.3,"fi",usetex=True,fontsize=50)ax.text(0.1,0.5,"ffl",usetex=True,fontsize=50)ax.set_xticks([])ax.set_yticks([])@image_comparison(["subsetting-dejavusans.pdf"])deftest_subsetting_dejavusans():# DejaVuSans uses the seac operator to compose characters with diacriticsmpl.rcParams['text.latex.preamble']='\n'.join((r'\usepackage{DejaVuSans}',r'\usepackage[T1]{fontenc}',r'\usepackage[utf8]{inputenc}'    ))fig,ax=plt.subplots()ax.text(0.1,0.1,r"\textsf{ñäö}",usetex=True,fontsize=50)ax.text(0.1,0.3,r"\textsf{fi}",usetex=True,fontsize=50)ax.text(0.1,0.5,r"\textsf{ffl}",usetex=True,fontsize=50)ax.set_xticks([])ax.set_yticks([])

The Heuristica callothersubr feature actually doesn't seem broken, but the fi and ffl ligatures are lost in both fonts.

Both work without subsetting, although the metrics for the ligature glyphs seem to be wrong in the current code.

I'll see if I can figure out what's wrong.

Copy link

MemberAuthor

jkseppan commentedMay 11, 2025

The ligature problem is probably because we don't apply the encoding from TeX's font configuration to the font before subsetting. The custom encoding array is output in the PDF file but should also be used to map from character codes to glyph names. The seac issue might be a different encoding problem where we should do the lookups using Adobe Standard Encoding and not the font's own encoding.

jkseppan force-pushed thetype1-subset branch from677029c to689aa54Compare

May 11, 2025 18:18

Copy link

MemberAuthor

jkseppan commentedMay 11, 2025

It seems that some of the latest changes broke compatibility with older GhostScript again.

But while I debug that, a note about the new tests: they use font packages that are available on Debian or Ubuntu only by installing texlive-fonts-extra, which brings in a lot of other fonts too. Currently these tests get skipped on all runners, but would it make sense to install the extra fonts on just one of the runners to allow these tests to get run somewhere?

jkseppan force-pushed thetype1-subset branch 2 times, most recently from4a8b6ff tocb204cdCompare

May 12, 2025 04:56

Copy link

MemberAuthor

jkseppan commentedMay 12, 2025

I added a test using Bitstream Charter, which is part of texlive-fonts-recommended, so we get at least some coverage of the full Type-1 subsetting code path.

I fixed the gs compatibility issue, which was about a broken Encoding object.

anntzer reviewed

May 13, 2025

View reviewed changes

lib/matplotlib/_type1font.py Outdated

		lenIV = self.prop.get('lenIV', 4)
		encrypted = [
		self._encrypt(charstrings[glyph], 'charstring', lenIV).decode('latin-1')
		for glyph in glyphs

Copy link

Contributor

anntzerMay 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Should this (and _subset_subrs below) sort the glyphs (and subrs) to ensure reproducibility? (as set ordering changes over runs)

Copy link

MemberAuthor

jkseppanMay 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The subrs are already in order (the loop isfor i in range(n_subrs)) but sorting the glyphs is a good idea.

jkseppan force-pushed thetype1-subset branch fromcb204cd tob9c87f1Compare

May 13, 2025 17:12

Copy link

MemberAuthor

jkseppan commentedMay 13, 2025

I removed the extra type annotations, which were incomplete in any case. I'll make a separate PR to annotate the entire file.

jkseppan force-pushed thetype1-subset branch fromb9c87f1 to1bc99cdCompare

May 14, 2025 02:14

jklymak reviewed

May 14, 2025

View reviewed changes

doc/users/next_whats_new/type1_subset.rst

		The fonts that get used are usually "Type 1" fonts.
		They used to be embedded in full
		but are now limited to the glyphs that are actually used in the figure.
		This reduces the size of the resulting PDF files.

Copy link

Member

jklymakMay 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This reads well to me. Thanks!

Copy link

Member

tacaswell commentedMay 15, 2025

Will these new test be adjusted by#29816 ? If so we should sequence that one first.

Copy link

MemberAuthor

jkseppan commentedMay 16, 2025

Will these new test be adjusted by#29816 ? If so we should sequence that one first.

I don't think these depend on FreeType, since the usetex case uses TeX for layout and parses dvi files to determine the coordinates of glyphs.

jkseppan force-pushed thetype1-subset branch from7ec8b45 toede2526Compare

May 29, 2025 13:57

github-actionsbot added the topic: text/mathtext label

May 29, 2025

Copy link

MemberAuthor

jkseppan commentedMay 29, 2025

The earlier CI error seems to have been caused by a newer mypy version detecting some more types in an unrelated file. The fix is also in PR#30119.

github-actionsbot added the status: needs rebase label

May 30, 2025

jkseppan force-pushed thetype1-subset branch from5be0d4f to58169e5Compare

May 30, 2025 04:49

github-actionsbot removed the status: needs rebase label

May 30, 2025

jkseppan removed the status: work in progress label

May 30, 2025

QuLogic added this to thev3.11.0 milestone

May 30, 2025

QuLogic reviewed

May 30, 2025

View reviewed changes

lib/matplotlib/_type1font.py OutdatedShow resolvedHide resolved

lib/matplotlib/backends/backend_pdf.py

		self.writeObject(fontdictObject, fontdict)
		self.writeObject(widthsObject, widths)

Copy link

Member

QuLogicMay 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Does this need to move after thefontdict? It should be reserved already.

Copy link

MemberAuthor

jkseppanMay 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It doesn't matter which order we write them in, but I think the Widths object needs to be indirect and not included in the font dictionary. This does need to come after we know the font encoding, which in turn depends on the glyph set.

But I guess you mean that this would read better if we write out the widths object first? Either way works for me.

Copy link

Member

QuLogicMay 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes, it did used to be written first as well, but moreso that it would be along with the other width-related code.

Copy link

MemberAuthor

jkseppanMay 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I rearranged this function a bit, and realized that the shared-font-descriptor optimization is now useless and wrong.

jkseppanand others added3 commits

May 30, 2025 13:50

Type-1 subsetting

76d8b3c

This reduces pdf file sizes when usetex is active, at the cost ofsome complexity in the code. We implement a charstring bytecodeinterpreter to keep track of subroutine calls in font programs.Give dviread.DviFont a fake filename attribute and a get_fontmapmethod for character tracking.In backend_pdf.py, refactor _get_subsetted_psname so it calls a method_get_subset_prefix, and reuse that to create tags for Type-1 fonts.Mark the methods static since they don't use anything from the instance.Recommend merging to main to give people time to test this, not toa 3.10 point release.Closesmatplotlib#127.Co-Authored-By: Elliott Sales de Andrade <quantum.analyst@gmail.com>

DOC: tweak wording in docstring

22198e9

Co-authored-by: Elliott Sales de Andrade <quantum.analyst@gmail.com>

Simplify match expression

53355ca

Co-authored-by: Elliott Sales de Andrade <quantum.analyst@gmail.com>

jkseppan force-pushed thetype1-subset branch from1d88a25 to53355caCompare

May 30, 2025 10:52

github-actionsbot removed the topic: text/mathtext label

May 30, 2025

Use one font descriptor for each Type-1 font

c77a459

The old optimization where the same font file and descriptor could beshared by multiple differently-encoded fonts is wrong in the presence ofsubsetting, unless we also take the exact glyph subset into account.It is very unlikely for the exact same subset to be used by differentencodings, so just remove the optimization.Rearrange _embedTeXFont in neater blocks.

Labels

backend: pdf topic: text/usetex

7 participants

		@@ -35,10 +37,64 @@

		from matplotlib.cbook import _format_approx
		from . import _api
		if T.TYPE_CHECKING:
		from collections.abc import Iterable

Movatterモバイル変換

Uh oh!

Type-1 font subsetting#20716

Are you sure you want to change the base?

Type-1 font subsetting#20716

Conversation

jkseppan commentedJul 22, 2021• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

PR Summary

PR Checklist

Uh oh!

anntzer commentedOct 5, 2021

Uh oh!

jkseppan commentedMay 4, 2025

Uh oh!

jkseppan commentedMay 4, 2025

Uh oh!

jkseppan commentedMay 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jkseppan commentedMay 11, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

jkseppan commentedMay 11, 2025

Uh oh!

jkseppan commentedMay 11, 2025

Uh oh!

jkseppan commentedMay 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkseppan commentedMay 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tacaswell commentedMay 15, 2025

Uh oh!

jkseppan commentedMay 16, 2025

Uh oh!

jkseppan commentedMay 29, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

jkseppan commentedJul 22, 2021•
edited
Loading

jkseppan commentedMay 11, 2025•
edited
Loading