NotificationsYou must be signed in to change notification settings
Fork8.1k
Star22.2k

Proof of concept: Type42 subsetting in pdf#18143

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Closed

jkseppan wants to merge4 commits intomatplotlib:masterfromjkseppan:subset-type42

Closed

Proof of concept: Type42 subsetting in pdf#18143

jkseppan wants to merge4 commits intomatplotlib:masterfromjkseppan:subset-type42

Conversation

Copy link

Member

jkseppan commentedAug 1, 2020

PR Summary

Usefonttools to subset TrueType fonts when embedding them in Type42 format. This is a somewhat hacky proof of concept, but it seems to work:

importmatplotlibfrommatplotlibimportpyplotaspltmatplotlib.rcParams['pdf.fonttype']=42plt.plot([3,1,4,1,5,9,2])plt.title(r'$\pi$')plt.text(1,5,'Hellø World! ()℻ǘ ⇐⇑⇒⇓←↑→↓↴↵≀')plt.savefig('foo.pdf')

outputs

SUBSET /Users/jks/matplotlib/lib/matplotlib/mpl-data/fonts/ttf/DejaVuSans-Oblique.ttf characters: πSUBSET /Users/jks/matplotlib/lib/matplotlib/mpl-data/fonts/ttf/DejaVuSans-Oblique.ttf 633840 -> 3052SUBSET /Users/jks/matplotlib/lib/matplotlib/mpl-data/fonts/ttf/DejaVuSans.ttf characters: ←↑→↓ !()0123456789↴℻↵≀H⇐⇑⇒⇓WǘdelorøSUBSET /Users/jks/matplotlib/lib/matplotlib/mpl-data/fonts/ttf/DejaVuSans.ttf 756072 -> 11340

and produces the attached filefoo.pdf, which looks fine in at least Preview.app. The debug output shows the size reduction from the original font file to the subset (before compression).

Do people think this would be worth pursuing? The fonttools library would be a new dependency, but it has been around for a long time and seems to be under development. It does raise a DeprecationWarning that seems quite pointless (you can just comment out the problematic import with no effect) but we could probably send them a PR to fix that. The library can also read and subset OpenType fonts and read Type-1 fonts (but it doesn't seem to include subsetting support for those).

PR Checklist

Has Pytest style unit tests
Code isFlake 8 compliant
New features are documented, with examples if plot related
Documentation is sphinx and numpydoc compliant
Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
Documented in doc/api/next_api_changes/* if API changed in a backward-incompatible way

Proof of concept: Type42 subsetting in pdf

0cac414

jkseppan added topic: text/fonts backend: pdf labels

Aug 1, 2020

jkseppan added3 commits

August 1, 2020 18:46

flake8

468c52c

Filter out just the py23 warning

9e01aca

More flake8

591f9a8

Copy link

Member

jklymak commentedAug 1, 2020

Looks fine in Acrobat.

I'm not an authority on extra dependencies, but this one certainly looks reasonable so long as it pip installs on most machines. Looks like its all python?

Does this come at a huge speed hit in creating the files? i.e. is it something the user may want to toggle?

Copy link

MemberAuthor

jkseppan commentedAug 2, 2020

I'm not an authority on extra dependencies, but this one certainly looks reasonable so long as it pip installs on most machines. Looks like its all python?

Yes, it's pure python. Some related projects are in C++, at least compreffor (something for reducing the size of tables in CFF fonts).

Does this come at a huge speed hit in creating the files? i.e. is it something the user may want to toggle?

I didn't measure, but on the command line it felt pretty fast.

This would have to be toggleable on a per-font basis, because font subsetting seems to be a bit of an arcane art. Font specifications have evolved over the years and there are many old font files and many PDF consuming applications out there, so I would not be surprised if subsetting some specific font causes some specific PDF viewer to fail to display it.

Copy link

Contributor

anntzer commentedAug 2, 2020

fonttools seems like a reasonable dependency. I don't know how much wewant to have type-42 subsetting (as in, is type-3 subsetting really not sufficient?), but I agree that if we do we more or less have to bring fonttools in.

Copy link

MemberAuthor

jkseppan commentedAug 2, 2020

I know that some publishers run a quality check on pdf files and reject them if there are any Type 3 fonts. I think this is because for a long time dvipdf/pdfTeX produced poor-quality Type 3 fonts, basically just TeX Metafonts rendered as bitmaps (since the conversion from Metafont to PostScript is not trivial). Eventually good-quality Type-1 versions of the TeX fonts became available but TeX systems had to be configured to use them, so requiring Type 1 instead of Type 3 was a simple way to ensure acceptable-quality fonts.

These days there probably is little reason for publishers not to accept files with Type 3 fonts, but when you have established that kind of quality check, it's hard to go back. Also I think I've heard that there are some uses of pdf files where Type 42 is actually better than Type 3, although I can't recall any details. Perhaps Asian language support? I'm sure there's some reason that both kinds of embeddings have been implemented.

anntzer mentioned this pull request

Aug 6, 2020

PostScript Type42 embedding is broken in various ways#18191

Closed

anntzer mentioned this pull request

Aug 20, 2020

Type42 font embedding broken for fonts without glyph names#18307

Closed

aitikgupta mentioned this pull request

Mar 2, 2021

Add kerning to single-byte strings in PDFs#19582

Merged

7 tasks

Copy link

Member

QuLogic commentedMay 6, 2021

So is the only thing holding this up verifying whether it might break something? Or is there some more implementation to be done?

jkseppan commented

Jun 8, 2021

View reviewed changes

lib/matplotlib/testing/conftest.py

		("markers","pytz: Tests that require pytz to be installed."),
		("filterwarnings","error"),
		("filterwarnings",
		"ignore:.*The py23 module has been deprecated:DeprecationWarning"),

Copy link

MemberAuthor

jkseppanJun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

this is probably not needed any more: seefonttools/fonttools#2035

jkseppan commented

Jun 8, 2021

View reviewed changes

lib/matplotlib/backends/backend_pdf.py

		withtempfile.NamedTemporaryFile(suffix='.ttf')astmp:
		tmp.write(fontdata)
		tmp.seek(0,0)
		font=FT2Font(tmp.name)

Copy link

MemberAuthor

jkseppanJun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Reloading the FT2Font object is a bit ugly, and I think it is only needed here to get the glyph widths, the cid to gid map and the unicode mapping. These could probably be obtained otherwise. On the other hand, reusing the old code makes this patch smaller.