Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

PrepareCharacterTracker for advanced font features#30608

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
QuLogic merged 5 commits intomatplotlib:text-overhaulfromQuLogic:simpler-track
Oct 2, 2025

Conversation

QuLogic
Copy link
Member

@QuLogicQuLogic commentedSep 27, 2025
edited
Loading

PR summary

The original code split fonts into subsets based on the character modulo the subset size (determined by font type limits). This had some limitations:

  1. If characters were used from across various font blocks (especially for type 3, whose blocks were only 256 characters), then there would be many, possibly sparsely-populated, font subsets.
  2. Sometimes a single character code may map to multiple glyphs. This is the case if you mix languages (i.e.,Add language parameter to Text objects #29794), but the naive tracking would only produce one glyph.
  3. Sometimes multiple characters can map to a single glyph. This is the case for ligatures, and also with complex text shaping (such as Arabic), and this would just fail callingord on a multi-char string.

To fix this,CharacterTracker now tracks characters and glyphs more closely. Specifically,

  1. for each font, a (character code(s), glyph index)-pair is mapped to a (subset index, subset character code)-pair. This ensures that point 2 above is handled.
  2. If the above map doesn't exist yet, then a subset index/character code is calculated:
    1. if the (singular) character code is in the first block (255 for type 3, or 64k for type 42), then keep the character code the same and put it in subset 0; this preserves the text in those lower ranges if you happen to be looking at a PDF directly
    2. if the (singular) character code isalready in subset 0, then bump it to the next available spot; a conflict here means the character is being used with multiple glyphs (i.e., another case for part 2 above)
    3. if the character code is in fact multiple character codes, then also bump to the next available spot as it could never be in the subset 0 (this is part 3 above)
    4. the next available spot is the next character code in the next subset block, if necessary; by filling as needed, this takes care of point 1 above

With these changes, the complex/font features/languages tests in#30607 produce correct results.

PR checklist

@QuLogicQuLogic added this to thev3.11.0 milestoneSep 27, 2025
@github-project-automationgithub-project-automationbot moved this toWaiting for other PR inFont and text overhaulSep 27, 2025
@QuLogicQuLogic moved this fromWaiting for other PR toReady for Review inFont and text overhaulSep 27, 2025
@QuLogicQuLogic changed the titleDeduplicateCharacterTracker.track implementationPrepareCharacterTracker for advanced font featuresSep 27, 2025
@QuLogic
Copy link
MemberAuthor

Note, I think the original commit was small, and the remaining ended up small enough, that I just put them all in this PR.

print("%%BeginProlog",file=fh)
ifnotmpl.rcParams['ps.useafm']:
Ndict+=len(ps_renderer._character_tracker.used)
Ndict+=sum(map(len,ps_renderer._character_tracker.used.values()),0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The 0 at the end is unneeded.

No need to repeat the calculation of subset blocks, but instead offloadit to `track_glyph`.
Instead of splitting fonts into `subset_size` blocks and writing text ascharacter code modulo `subset_size`, compress the blocks by doing twothings:1. Preserve the character code if it lies in the first block. This keeps   ASCII (for Type 3) and the Basic Multilingual Plane (for Type 42) as   their normal codes.2. Push everything else into the next spot in the next block, splitting   by `subset_size` as necessary.This should reduce the number of additional font subsets to embed.
If mixing languages, sometimes a single character may use differentglyphs in one document. In that case, we need to give it a new charactercode in the next subset, since subset 0 is preserving character codes.
@QuLogic
Copy link
MemberAuthor

OK, I've handled all your comments, I think. I also fixed subsetting in the PostScript backend, noted above.

There are 3 test image changes:

  • 2 PDF tests think that the fourth emoji moved slightly; I think this is because the characters on the line are now from the same font instead of split between emoji/not. But this seems very small, as viewing it at 100% doesn't really seem to show any difference.
  • 1 EPS test change; this is because subsetting is now "real", so the switch from type 3 to type 42 no longer happens. Ghostscript seems to convert those to raster a little different, but on the upside, now the Computer Modern and DejaVu fonts look the same.

Copy link
Contributor

@anntzeranntzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Just a minor point regarding a comment.

For ligatures or complex shapings, multiple characters may map to asingle glyph. In this case, we still want to output a single charactercode for the string using the font subset, but the `ToUnicode` mapshould give back all the characters.
Previously, this was supposed to "upgrade" type 3 to type 42 if thenumber of glyphs overflowed. However, as `CharacterTracker` can suggesta new subset for other reasons (i.e., multiple glyphs for the samecharacter or a glyph for multiple characters may go to a second subset),we do need proper subset handling here as well.Since that is now done, we can drop the "promotion" from type 3 to type42, as we don't get too many glyphs in each embedded font.
@QuLogic
Copy link
MemberAuthor

Removed the image changes (and moved them to thetext-overhaul-figures branch) in preparation for merging.

@QuLogic
Copy link
MemberAuthor

Linting issues are known (#30626), so merging over those.

@QuLogicQuLogic merged commited4ca6c intomatplotlib:text-overhaulOct 2, 2025
35 of 36 checks passed
@github-project-automationgithub-project-automationbot moved this fromReady for Review toDone inFont and text overhaulOct 2, 2025
@QuLogicQuLogic deleted the simpler-track branchOctober 2, 2025 23:00
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@anntzeranntzeranntzer approved these changes

@ksundenksundenksunden approved these changes

Assignees
No one assigned
Projects
Status: Done
Milestone
v3.11.0
Development

Successfully merging this pull request may close these issues.

3 participants
@QuLogic@anntzer@ksunden

[8]ページ先頭

©2009-2025 Movatter.jp