Movatterモバイル変換

lib/matplotlib/_type1font.pyShow resolvedHide resolved

anntzer approved these changes

Copy link

Contributor

anntzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

A few minor points to be considered, but overall this looks great.

anntzer reviewed

lib/matplotlib/_type1font.py Outdated

		postscript_stack:list[float],
		opcode:int\|str,
		)->tuple[set,set,list[float],list[float]]:
		"""Run one step in the charstring interpreter."""

Copy link

Contributor

anntzerMay 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I wonder if this may be clearer if you mutate buildchar_stack and postscript_stack in-place? this way you would write

glyphs=set();subrs=set()ifopcodein {...}:buildchar_stack[:]= []elifopcode=="seac":codes= ...;glyphs.update(...)buildchar_stack[:]= []elifopcode=="div":num2=buildchar_stack.pop()num1=buildchar_stack.pop()buildchar_stack.append(num1/num2)...returnglyphs,subrs

which feels perhaps more in the spirit of a postscript interpreter?

Copy link

MemberAuthor

jkseppanMay 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Could be! I'll think about this a little.

Copy link

MemberAuthor

jkseppanMay 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I moved this into a separate class with the stacks and glyph/subr sets as members. I agree that it looks clearer that way.

anntzer reviewed

lib/matplotlib/_type1font.py OutdatedShow resolvedHide resolved

QuLogic reviewed

May 9, 2025

doc/api/next_api_changes/behavior/20716-JKS.rst OutdatedShow resolvedHide resolved

doc/users/next_whats_new/type1_subset.rst Outdated

Comment on lines 1 to 2

		Type 1 fonts are now subsetted in PDF output
		--------------------------------------------

Copy link

Member

QuLogicMay 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change

	Type 1 fonts are nowsubsetted in PDF output
	--------------------------------------------
	Type 1 fonts are nowsubset in PDF output
	-----------------------------------------

Copy link

MemberAuthor

jkseppanMay 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think I disagree here... the English verb "set" is irregular in that way, but if you search for "subsetted" in the context of fonts, it seems to be fairly common, including in fonttools and various Adobe forums. See also the accepted answer tothis question and possibly the discussion offlied out in Steven Pinker'sWords and Rules.

Copy link

Member

jklymakMay 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Maybe just be a bit more verbose?

Suggested change

	Type 1 fonts are now subsetted in PDF output
	--------------------------------------------
	PDFs embed just the subset of Type 1 glyphs that are used
	-----------------------------------------------------------

Copy link

Member

QuLogicMay 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't disagree that this is the correct conjugation in the past tense, rather that these sentences are not in the past tense. It is stating what is and in the (foreseeable) future shall occur.

Copy link

MemberAuthor

jkseppanMay 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I reworded the whole paragraph to be hopefully more understandable on its own.

doc/users/next_whats_new/type1_subset.rst Outdated


		When using the usetex feature with the PDF backend, Type 1 fonts are embedded
		in the PDF output. These fonts used to be embedded in full, but they are now
		subsetted to only include the glyphs that are actually used in the figure.

Copy link

Member

QuLogicMay 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change

	subsetted to only include the glyphs that are actually used in the figure.
	subset to only include the glyphs that are actually used in the figure.

lib/matplotlib/_type1font.py OutdatedShow resolvedHide resolved

lib/matplotlib/backends/backend_pdf.py OutdatedShow resolvedHide resolved

lib/matplotlib/tests/test_usetex.py OutdatedShow resolvedHide resolved

Copy link

MemberAuthor

jkseppan commentedMay 11, 2025•
edited
Loading

I was trying to improve test coverage and discovered some cases where this implementation breaks the output. Math examples seem to work fine, but there are some text fonts that use less common features that I'm clearly not handling right. Here are two cases:

@image_comparison(["subsetting-heuristica.pdf"])deftest_subsetting_heuristica():# Heuristica uses the callothersubr operator for some glyphsmpl.rcParams['text.latex.preamble']='\n'.join((r'\usepackage{heuristica}',r'\usepackage[T1]{fontenc}',r'\usepackage[utf8]{inputenc}'    ))fig,ax=plt.subplots()ax.text(0.1,0.1,r"BHTem",usetex=True,fontsize=50)ax.text(0.1,0.3,"fi",usetex=True,fontsize=50)ax.text(0.1,0.5,"ffl",usetex=True,fontsize=50)ax.set_xticks([])ax.set_yticks([])@image_comparison(["subsetting-dejavusans.pdf"])deftest_subsetting_dejavusans():# DejaVuSans uses the seac operator to compose characters with diacriticsmpl.rcParams['text.latex.preamble']='\n'.join((r'\usepackage{DejaVuSans}',r'\usepackage[T1]{fontenc}',r'\usepackage[utf8]{inputenc}'    ))fig,ax=plt.subplots()ax.text(0.1,0.1,r"\textsf{ñäö}",usetex=True,fontsize=50)ax.text(0.1,0.3,r"\textsf{fi}",usetex=True,fontsize=50)ax.text(0.1,0.5,r"\textsf{ffl}",usetex=True,fontsize=50)ax.set_xticks([])ax.set_yticks([])

The Heuristica callothersubr feature actually doesn't seem broken, but the fi and ffl ligatures are lost in both fonts.

Both work without subsetting, although the metrics for the ligature glyphs seem to be wrong in the current code.

I'll see if I can figure out what's wrong.

Copy link

MemberAuthor

jkseppan commentedMay 11, 2025

The ligature problem is probably because we don't apply the encoding from TeX's font configuration to the font before subsetting. The custom encoding array is output in the PDF file but should also be used to map from character codes to glyph names. The seac issue might be a different encoding problem where we should do the lookups using Adobe Standard Encoding and not the font's own encoding.

jkseppan force-pushed thetype1-subset branch from677029c to689aa54Compare

May 11, 2025 18:18

Copy link

MemberAuthor

jkseppan commentedMay 11, 2025

It seems that some of the latest changes broke compatibility with older GhostScript again.

But while I debug that, a note about the new tests: they use font packages that are available on Debian or Ubuntu only by installing texlive-fonts-extra, which brings in a lot of other fonts too. Currently these tests get skipped on all runners, but would it make sense to install the extra fonts on just one of the runners to allow these tests to get run somewhere?

jkseppan force-pushed thetype1-subset branch 2 times, most recently from4a8b6ff tocb204cdCompare

May 12, 2025 04:56

Copy link

MemberAuthor

jkseppan commentedMay 12, 2025

I added a test using Bitstream Charter, which is part of texlive-fonts-recommended, so we get at least some coverage of the full Type-1 subsetting code path.

I fixed the gs compatibility issue, which was about a broken Encoding object.

anntzer reviewed

May 13, 2025

lib/matplotlib/_type1font.py Outdated

		lenIV = self.prop.get('lenIV', 4)
		encrypted = [
		self._encrypt(charstrings[glyph], 'charstring', lenIV).decode('latin-1')
		for glyph in glyphs

Copy link

Contributor

anntzerMay 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Should this (and _subset_subrs below) sort the glyphs (and subrs) to ensure reproducibility? (as set ordering changes over runs)

Copy link

MemberAuthor

jkseppanMay 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The subrs are already in order (the loop isfor i in range(n_subrs)) but sorting the glyphs is a good idea.

jkseppan force-pushed thetype1-subset branch fromcb204cd tob9c87f1Compare

May 13, 2025 17:12

Copy link

MemberAuthor

jkseppan commentedMay 13, 2025

I removed the extra type annotations, which were incomplete in any case. I'll make a separate PR to annotate the entire file.

jkseppan force-pushed thetype1-subset branch fromb9c87f1 to1bc99cdCompare

May 14, 2025 02:14

jklymak reviewed

May 14, 2025

doc/users/next_whats_new/type1_subset.rst

		The fonts that get used are usually "Type 1" fonts.
		They used to be embedded in full
		but are now limited to the glyphs that are actually used in the figure.
		This reduces the size of the resulting PDF files.

Copy link

Member

jklymakMay 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This reads well to me. Thanks!

Copy link

Member

tacaswell commentedMay 15, 2025

Will these new test be adjusted by#29816 ? If so we should sequence that one first.

Copy link

MemberAuthor

jkseppan commentedMay 16, 2025

Will these new test be adjusted by#29816 ? If so we should sequence that one first.

I don't think these depend on FreeType, since the usetex case uses TeX for layout and parses dvi files to determine the coordinates of glyphs.

jkseppan force-pushed thetype1-subset branch from7ec8b45 toede2526Compare

May 29, 2025 13:57

github-actionsbot added the topic: text/mathtext label

May 29, 2025

Copy link

MemberAuthor

jkseppan commentedMay 29, 2025

The earlier CI error seems to have been caused by a newer mypy version detecting some more types in an unrelated file. The fix is also in PR#30119.

github-actionsbot added the status: needs rebase label

jkseppanand others added3 commits

May 30, 2025 04:26

Type-1 subsetting

413a4d9

This reduces pdf file sizes when usetex is active, at the cost ofsome complexity in the code. We implement a charstring bytecodeinterpreter to keep track of subroutine calls in font programs.Give dviread.DviFont a fake filename attribute and a get_fontmapmethod for character tracking.In backend_pdf.py, refactor _get_subsetted_psname so it calls a method_get_subset_prefix, and reuse that to create tags for Type-1 fonts.Mark the methods static since they don't use anything from the instance.Recommend merging to main to give people time to test this, not toa 3.10 point release.Closesmatplotlib#127.Co-Authored-By: Elliott Sales de Andrade <quantum.analyst@gmail.com>

DOC: tweak wording in docstring

1fea704

Co-authored-by: Elliott Sales de Andrade <quantum.analyst@gmail.com>

Add some types to _mathtext.py

58169e5

Mypy 1.16.0 flags errors here:lib/matplotlib/_mathtext.py:2531: error: "Node" has no attribute "width"  [attr-defined]lib/matplotlib/_mathtext.py:2608: error: List item 0 has incompatible type "Kern"; expected "Hlist | Vlist"  [list-item]The check for the attribute _metrics is equivalent to checking for aninstance of Char, since only Char and its subclasses set self._metrics.Mypy infers an unnecessarily tight type list[Hlist | Vlist] forspaced_nucleus so we give it a more general one.

jkseppan force-pushed thetype1-subset branch from5be0d4f to58169e5Compare

May 30, 2025 04:49

github-actionsbot removed the status: needs rebase label

jkseppan removed the status: work in progress label

QuLogic added this to thev3.11.0 milestone