For character codes outside the embedded font limits (256 for type 3 and 65536 for type 42), we output them asXObjects instead of using text commands. But there is nothing in the PDF spec that requires any specific encoding like this.

Since we now support subsetting all fonts before embedding, split each font into groups based on the maximum character code (e.g., 256-entry groups for type 3), then switch text strings to a different font subset and re-map character codes to it when necessary.

This means all text is true text (albeit with some strange encoding), and we no longer need any XObjects for glyphs. For users of non-English text, this means it will become selectable and copyable again.

There are 3 steps to achieve this change:

Track both character codes and glyphs inCharacterTracker. This class takes care of splitting characters into subsets that fit the desired PDF font type limits. -> moved topdf/ps: Track full character map in CharacterTracker #30566
Output each used font block as a separate subsetted font. Also change the subset prefix to use the glyph indices, which are unique, unlike the character codes. -> first commit here
Generate aToUnicode dictionary for the subset font. We already did this for type 42 fonts, but the implementation was incorrect as it didn't correctly handle non-BMP characters. For type 3, support was added in PDF 1.2, but we produce 1.4; there is a fallback to the glyph names, but it is inconsistent and probably depends on the original font having the right names. -> second commit here

In the future, we may wish to extend the implementation inCharacterTracker to "compress" the character map it produces (i.e., if you use 255 characters all from a different 256-sized block with type 3, you get 255 fonts, but we could compress that to a single font.) I tried to avoid hard-coding any assumptions that the mapping is block-by-block, but it is possible that something slipped through, so I do not want to spend too much time on that right now.

Formerly, withmulti_font_type3.pdf (after adding the emoji to the test), copying the text in evince would produce:

There are basic charactersABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz0123456789 !”#$%&’()*+,-./:;¡=¿?@[“]ˆ˙‘—–˝˜and accented charactersÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿin between!

and withmulti_font_type42.pdf:

There are basic charactersABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz0123456789 !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~and accented charactersÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăĄąĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıĲĳĴĵĶķĸĹĺĻļĽľĿŀŁłŃńŅņŇňŉŊŋŌōŎŏŐőŒœŔŕŖŗŘřŚśŜŝŞşŠšŢţŤťŦŧŨũŪūŬŭŮůŰűŲųŴŵŶŷŸŹźŻżŽžſƀƁƂƃƄƅƆƇƈƉƊƋƌƍƎƏƐƑƒƓƔƕƖƗƘƙƚƛƜƝƞƟƠơƢƣƤƥƦƧƨƩƪƫƬƭƮƯưƱƲƳƴƵƶƷƸƹƺƻƼƽƾƿǀǁǂǃǄǅǆǇǈǉǊǋǌǍǎǏǐǑǒǓǔǕǖǗǘǙǚǛǜǝǞǟǠǡǢǣǤǥǦǧǨǩǪǫǬǭǮǯǰǱǲǳǴǵǶǷǸǹǺǻǼǽǾǿȀȁȂȃȄȅȆȇȈȉȊȋȌȍȎȏȐȑȒȓȔȕȖȗȘșȚțȜȝȞȟȠȡȢȣȤȥȦȧȨȩȪȫȬȭȮȯȰȱȲȳȴȵȶȷȸȹȺȻȼȽȾȿɀɁɂɃɄɅɆɇɈɉɊɋɌɍɎɏin between!

and now we get for both type 3 and 42:

There are basic charactersABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz0123456789 !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~and accented charactersÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăĄąĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıĲĳĴĵĶķĸĹĺĻļĽľĿŀŁłŃńŅņŇňŉŊŋŌōŎŏŐőŒœŔŕŖŗŘřŚśŜŝŞşŠšŢţŤťŦŧŨũŪūŬŭŮůŰűŲųŴŵŶŷŸŹźŻżŽžſƀƁƂƃƄƅƆƇƈƉƊƋƌƍƎƏƐƑƒƓƔƕƖƗƘƙƚƛƜƝƞƟƠơƢƣƤƥƦƧƨƩƪƫƬƭƮƯưƱƲƳƴƵƶƷƸƹƺƻƼƽƾƿǀǁǂǃǄǅǆǇǈǉǊǋǌǍǎǏǐǑǒǓǔǕǖǗǘǙǚǛǜǝǞǟǠǡǢǣǤǥǦǧǨǩǪǫǬǭǮǯǰǱǲǳǴǵǶǷǸǹǺǻǼǽǾǿȀȁȂȃȄȅȆȇȈȉȊȋȌȍȎȏȐȑȒȓȔȕȖȗȘșȚțȜȝȞȟȠȡȢȣȤȥȦȧȨȩȪȫȬȭȮȯȰȱȲȳȴȵȶȷȸȹȺȻȼȽȾȿɀɁɂɃɄɅɆɇɈɉɊɋɌɍɎɏ😀😁😂😃😄😅😆😇😈😉😊😋😌😍😎😏in between!

Note how in the third line for type 3:

the quotes are 'curly' instead of straight quotes
the chevrons<> are inverted exclamation/question marks¡¿
the backslash\ is a curly opening double quote“
the caret^, underscore_, and tilde~ are {circumflex, dot, tilde} accents/smaller glyphsˆ˙˜
the braces{} are em-dash and curly quotes—˝
the pipe| is en-dash–
Everything from the seventh to second-last line is missing in type 3 since it's outside of the 256 limit, and all the emoji are missing from type 42 since that's outside the 65536 limit.

~~This depends on#30520,#30335,#30566, and#30567.~~

PR checklist

"closes #0000" is in the body of the PR description tolink the related issue
new and changed code istested
[n/a]Plotting related features are demonstrated in anexample
New Features andAPI Changes are noted with adirective and release note
Documentation complies withgeneral anddocstring guidelines

QuLogic added this to thev3.11.0 milestone

Sep 4, 2025

QuLogic added this toFont and text overhaul

Sep 4, 2025

QuLogic added the status: waiting for other PR label

Sep 4, 2025

github-project-automationbot moved this toWaiting for other PR inFont and text overhaul

Sep 4, 2025

github-actionsbot added topic: text backend: ps backend: pdf backend: svg backend: cairo topic: text/mathtext labels

Sep 4, 2025

QuLogic force-pushed thepdf-text-subsets branch from7ffffb5 to3fc92f4Compare

September 4, 2025 06:06

QuLogic mentioned this pull request

Sep 4, 2025

TST: Remove redundant font tests#30513

Draft

1 task

Copy link

Contributor

anntzer commentedSep 4, 2025

This is great and would also allow getting rid of _get_pdf_charprocs. I'll try to have a look at#30335 to start...

QuLogic mentioned this pull request

Sep 4, 2025

Use glyph indices for font tracking in vector formats#30335

Merged

1 task

Copy link

Contributor

anntzer commentedSep 5, 2025•
edited
Loading

The first two commits (the loop merge and the Type3 encoding change) seem independent from the rest (even from the switch to glyph index tracking) and could be merged first via a separate PR? (I can probably approve them right away.)
I still need to properly review the next one (charmap tracking) but that can also come next by itself?

QuLogic mentioned this pull request

Sep 6, 2025

pdf: Simplify Type 3 font character encoding#30520

Merged

1 task

Copy link

MemberAuthor

QuLogic commentedSep 6, 2025

I split the type3 encoding to#30520, but the loop merge has conflicts with the glyph index change.

github-actionsbot added the status: needs rebase label

Sep 8, 2025

QuLogic force-pushed thepdf-text-subsets branch from3fc92f4 to275fb16Compare

September 16, 2025 05:46

github-actionsbot removed the status: needs rebase label

Sep 16, 2025

anntzer reviewed

Sep 16, 2025

View reviewed changes

lib/matplotlib/backends/_backend_pdf_ps.py OutdatedShow resolvedHide resolved

anntzer reviewed

Sep 16, 2025

View reviewed changes

lib/matplotlib/backends/_backend_pdf_ps.pyShow resolvedHide resolved

This was referencedSep 16, 2025

pdf/ps: Track full character map in CharacterTracker#30566

Merged

pdf: Merge loops for single byte text chunk output#30567

Merged

QuLogic force-pushed thepdf-text-subsets branch from275fb16 to60f3a4fCompare

September 16, 2025 06:59

Copy link

MemberAuthor

QuLogic commentedSep 16, 2025

The first two commits (the loop merge and the Type3 encoding change) seem independent from the rest (even from the switch to glyph index tracking) and could be merged first via a separate PR? (I can probably approve them right away.)

Split the loop merge as well.

anntzer reviewed

Sep 16, 2025

View reviewed changes

lib/matplotlib/backends/backend_pdf.py OutdatedShow resolvedHide resolved

anntzer reviewed

Sep 16, 2025

View reviewed changes

lib/matplotlib/backends/backend_pdf.pyShow resolvedHide resolved

QuLogic force-pushed thepdf-text-subsets branch from60f3a4f toaf3ea7fCompare

September 17, 2025 01:38

github-actionsbot removed the topic: text label

Sep 17, 2025

github-actionsbot removed backend: svg backend: cairo topic: text/mathtext labels

Sep 17, 2025

QuLogic linked an issue

Sep 17, 2025

that may beclosed by this pull request

[Bug]: Math fonts (Type 3) incorrectly embedded in PDF?#21797

Open

QuLogic force-pushed thepdf-text-subsets branch fromaf3ea7f toe86ca1eCompare

September 19, 2025 07:01

QuLogic removed the status: waiting for other PR label

Sep 19, 2025

QuLogic moved this fromWaiting for other PR toReady for Review inFont and text overhaul

Sep 19, 2025

QuLogic force-pushed thepdf-text-subsets branch frome86ca1e toad319c7Compare

September 19, 2025 07:34

QuLogic marked this pull request as ready for review

September 19, 2025 07:36

github-actionsbot added the status: needs rebase label

Sep 20, 2025

QuLogic force-pushed thepdf-text-subsets branch fromad319c7 tocf9aff6Compare

September 22, 2025 21:20

github-actionsbot removed the status: needs rebase label

Sep 22, 2025

tacaswell approved these changes

Sep 25, 2025

View reviewed changes

anntzer approved these changes

Sep 25, 2025

View reviewed changes

QuLogic added4 commits

September 25, 2025 19:05

pdf: Improve text with characters outside embedded font limits

b70fb88

For character codes outside the embedded font limits (256 for type 3 and65536 for type 42), we output them as XObjects instead of using textcommands. But there is nothing in the PDF spec that requires anyspecific encoding like this.Since we now support subsetting all fonts before embedding, split eachfont into groups based on the maximum character code (e.g., 256-entrygroups for type 3), then switch text strings to a different font subsetand re-map character codes to it when necessary.This means all text is true text (albeit with some strange encoding),and we no longer need any XObjects for glyphs. For users of non-Englishtext, this means it will become selectable and copyable again.Fixesmatplotlib#21797

pdf: Correct Unicode mapping for out-of-range font chunks

1c4af68

For Type 3 fonts, add a `ToUnicode` mapping (which was added in PDF1.2), and for Type 42 fonts, correct the Unicode encoding, which shouldbe UTF-16BE, not UCS2.

TST: Add emoji to multi-font text

6cedcf7

These characters are outside the BMP and should test subset splittingfor type 42 output in PDF.

DOC: Add a release note for PDF font embedding fixes

c908bbf

QuLogic force-pushed thepdf-text-subsets branch fromcf9aff6 toc908bbfCompare

September 26, 2025 00:08

Copy link

MemberAuthor

QuLogic commentedSep 26, 2025

Rebased without images (moved totext-overhaul-figures branch) so that it can be merged.

QuLogic merged commita1ed4ef intomatplotlib:text-overhaul

Sep 26, 2025

34 of 35 checks passed

github-project-automationbot moved this fromReady for Review toDone inFont and text overhaul

Sep 26, 2025

QuLogic deleted the pdf-text-subsets branch

September 26, 2025 01:49

QuLogic mentioned this pull request

Oct 3, 2025

Remove forced fallback from FT2Font::load_char#30627

Open

1 task

Labels

backend: pdf backend: ps

3 participants

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

pdf: Improve text with characters outside embedded font limits#30512

pdf: Improve text with characters outside embedded font limits#30512

Uh oh!

Conversation

QuLogic commentedSep 4, 2025•
edited
Loading

Uh oh!

PR summary

PR checklist

Uh oh!

anntzer commentedSep 4, 2025

Uh oh!

anntzer commentedSep 5, 2025•
edited
Loading

Uh oh!

Uh oh!

QuLogic commentedSep 6, 2025

Uh oh!

Uh oh!

Uh oh!

QuLogic commentedSep 16, 2025

Uh oh!

Uh oh!

Uh oh!

QuLogic commentedSep 26, 2025

Uh oh!

Uh oh!

Uh oh!

Movatterモバイル変換

Uh oh!

pdf: Improve text with characters outside embedded font limits#30512

pdf: Improve text with characters outside embedded font limits#30512

Uh oh!

Conversation

QuLogic commentedSep 4, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

PR summary

PR checklist

Uh oh!

anntzer commentedSep 4, 2025

Uh oh!

anntzer commentedSep 5, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

QuLogic commentedSep 6, 2025

Uh oh!

Uh oh!

Uh oh!

QuLogic commentedSep 16, 2025

Uh oh!

Uh oh!

Uh oh!

QuLogic commentedSep 26, 2025

Uh oh!

Uh oh!

Uh oh!

QuLogic commentedSep 4, 2025•
edited
Loading

anntzer commentedSep 5, 2025•
edited
Loading