If characters were used from across various font blocks (especially for type 3, whose blocks were only 256 characters), then there would be many, possibly sparsely-populated, font subsets.
Sometimes a single character code may map to multiple glyphs. This is the case if you mix languages (i.e.,Add language parameter to Text objects #29794), but the naive tracking would only produce one glyph.
Sometimes multiple characters can map to a single glyph. This is the case for ligatures, and also with complex text shaping (such as Arabic), and this would just fail callingord on a multi-char string.

To fix this,CharacterTracker now tracks characters and glyphs more closely. Specifically,

for each font, a (character code(s), glyph index)-pair is mapped to a (subset index, subset character code)-pair. This ensures that point 2 above is handled.
If the above map doesn't exist yet, then a subset index/character code is calculated:
1. if the (singular) character code is in the first block (255 for type 3, or 64k for type 42), then keep the character code the same and put it in subset 0; this preserves the text in those lower ranges if you happen to be looking at a PDF directly
2. if the (singular) character code isalready in subset 0, then bump it to the next available spot; a conflict here means the character is being used with multiple glyphs (i.e., another case for part 2 above)
3. if the character code is in fact multiple character codes, then also bump to the next available spot as it could never be in the subset 0 (this is part 3 above)
4. the next available spot is the next character code in the next subset block, if necessary; by filling as needed, this takes care of point 1 above

With these changes, the complex/font features/languages tests in#30607 produce correct results.

PR checklist

[n/a] "closes #0000" is in the body of the PR description tolink the related issue
new and changed code istested
[n/a]Plotting related features are demonstrated in anexample
[n/a]New Features andAPI Changes are noted with adirective and release note
[n/a] Documentation complies withgeneral anddocstring guidelines

QuLogic added this to thev3.11.0 milestone

Sep 27, 2025

QuLogic added this toFont and text overhaul

Sep 27, 2025

github-project-automationbot moved this toWaiting for other PR inFont and text overhaul

Sep 27, 2025

QuLogic moved this fromWaiting for other PR toReady for Review inFont and text overhaul

Sep 27, 2025

github-actionsbot added backend: ps backend: pdf labels

Sep 27, 2025

QuLogic force-pushed thesimpler-track branch from58874b6 to5afd71bCompare

September 27, 2025 03:21

QuLogic changed the title~~DeduplicateCharacterTracker.track implementation~~PrepareCharacterTracker for advanced font features

Sep 27, 2025

Copy link

MemberAuthor

QuLogic commentedSep 27, 2025

Note, I think the original commit was small, and the remaining ended up small enough, that I just put them all in this PR.

QuLogic mentioned this pull request

Sep 27, 2025

Implement libraqm for vector outputs#30607

Merged

5 tasks

anntzer reviewed

Sep 29, 2025

View reviewed changes

lib/matplotlib/backends/_backend_pdf_ps.py OutdatedShow resolvedHide resolved

anntzer reviewed

Sep 29, 2025

View reviewed changes

lib/matplotlib/backends/_backend_pdf_ps.py OutdatedShow resolvedHide resolved

anntzer reviewed

Sep 29, 2025

View reviewed changes

lib/matplotlib/backends/_backend_pdf_ps.py OutdatedShow resolvedHide resolved

anntzer reviewed

Sep 29, 2025

View reviewed changes

lib/matplotlib/backends/_backend_pdf_ps.py OutdatedShow resolvedHide resolved

anntzer reviewed

Sep 29, 2025

View reviewed changes

lib/matplotlib/backends/_backend_pdf_ps.py OutdatedShow resolvedHide resolved

anntzer reviewed

Sep 29, 2025

View reviewed changes

lib/matplotlib/backends/backend_pdf.py OutdatedShow resolvedHide resolved

anntzer reviewed

Sep 29, 2025

View reviewed changes

lib/matplotlib/backends/backend_ps.py Outdated

		print("%%BeginProlog",file=fh)
		ifnotmpl.rcParams['ps.useafm']:
		Ndict+=len(ps_renderer._character_tracker.used)
		Ndict+=sum(map(len,ps_renderer._character_tracker.used.values()),0)

Copy link

Contributor

anntzerSep 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The 0 at the end is unneeded.

anntzer reviewed

Sep 29, 2025

View reviewed changes

lib/matplotlib/backends/backend_ps.pyShow resolvedHide resolved

DeduplicateCharacterTracker.track implementation

50f76ff

No need to repeat the calculation of subset blocks, but instead offloadit to `track_glyph`.

QuLogic force-pushed thesimpler-track branch from5e89363 to662ac58Compare

September 30, 2025 00:27

QuLogic added2 commits

September 30, 2025 00:57

pdf/ps: Compress subsetted font blocks

8274e17

Instead of splitting fonts into `subset_size` blocks and writing text ascharacter code modulo `subset_size`, compress the blocks by doing twothings:1. Preserve the character code if it lies in the first block. This keeps   ASCII (for Type 3) and the Basic Multilingual Plane (for Type 42) as   their normal codes.2. Push everything else into the next spot in the next block, splitting   by `subset_size` as necessary.This should reduce the number of additional font subsets to embed.

pdf: Fix first-block characters using multiple glyph representations

70dc388

If mixing languages, sometimes a single character may use differentglyphs in one document. In that case, we need to give it a new charactercode in the next subset, since subset 0 is preserving character codes.

QuLogic force-pushed thesimpler-track branch from662ac58 toc21a0b0Compare

September 30, 2025 05:18

Copy link

MemberAuthor

QuLogic commentedSep 30, 2025

OK, I've handled all your comments, I think. I also fixed subsetting in the PostScript backend, noted above.

There are 3 test image changes:

2 PDF tests think that the fourth emoji moved slightly; I think this is because the characters on the line are now from the same font instead of split between emoji/not. But this seems very small, as viewing it at 100% doesn't really seem to show any difference.
1 EPS test change; this is because subsetting is now "real", so the switch from type 3 to type 42 no longer happens. Ghostscript seems to convert those to raster a little different, but on the upside, now the Computer Modern and DejaVu fonts look the same.

anntzer reviewed

Sep 30, 2025

View reviewed changes

lib/matplotlib/backends/backend_pdf.py OutdatedShow resolvedHide resolved

anntzer approved these changes

Sep 30, 2025

View reviewed changes

Copy link

Contributor

anntzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Just a minor point regarding a comment.

QuLogic added2 commits

September 30, 2025 03:23

pdf: Support multi-character glyphs when subsetting

df670cf

For ligatures or complex shapings, multiple characters may map to asingle glyph. In this case, we still want to output a single charactercode for the string using the font subset, but the `ToUnicode` mapshould give back all the characters.

ps: Fix font subset handling

ed5e074

Previously, this was supposed to "upgrade" type 3 to type 42 if thenumber of glyphs overflowed. However, as `CharacterTracker` can suggesta new subset for other reasons (i.e., multiple glyphs for the samecharacter or a glyph for multiple characters may go to a second subset),we do need proper subset handling here as well.Since that is now done, we can drop the "promotion" from type 3 to type42, as we don't get too many glyphs in each embedded font.

QuLogic force-pushed thesimpler-track branch fromc21a0b0 tod781040Compare

September 30, 2025 07:23