Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork7.9k
Improve Type-1 font parsing#20715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
981ef67
to90b5889
CompareWith this I can produce smaller pdf files with usetex in some smalltests, but this obviously needs more extensive testing, thus markingas draft.On top ofmatplotlib#20634 andmatplotlib#20715.Closesmatplotlib#127.
With this I can produce smaller pdf files with usetex in some smalltests, but this obviously needs more extensive testing, thus markingas draft.On top ofmatplotlib#20634 andmatplotlib#20715.Closesmatplotlib#127.
e35728b
to9418b35
CompareWith this I can produce smaller pdf files with usetex in some smalltests, but this obviously needs more extensive testing, thus markingas draft.On top ofmatplotlib#20715.Closesmatplotlib#127.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
19119d8
to25613b9
CompareThanks for the review@anntzer! |
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Move Type1Font._tokens into a top-level function _tokenize that is acoroutine. The parsing stage consuming the tokens can instruct thetokenizer to return a binary token - this is necessary when decryptingthe CharStrings and Subrs arrays, since the preceding context determineswhich parts of the data need to be decrypted.The function now also parses the encrypted portion of the font file.To support usage as a coroutine, move the whitespace filtering into thefunction, since passing the information about binary tokens would noteasily work through a filter.The function now returns tokens as subclasses of a new _Token class,which carry the position and value of the token and can havetoken-specific helper methods. The position data will be needed whenmodifying the file, as the font is transformed or subsetted.A new helper function _expression can be used to consume tokens thatform a balanced subexpression delimited by [] or {}. This helps fix abug in UniqueID removal: if the font includes PostScript code thatchecks if the UniqueID is set in the current dictionary, the previouscode broke that code instead of removing the UniqueID definition. Fontscan include UniqueID in the encrypted portion as well as the cleartextone, and removal is now done in both portions.Fix a bug related to font weight: the key is title-cased and notlower-cased, so font.prop['weight'] should not exist.
Type-1 fonts are required to have subroutines with specific contentsbut their names may vary. They are usually ND, NP and RD but nameslike | and |- appear too.
25613b9
toe98bb83
CompareI added some tests and realized that the string escaping was slightly wrong. Now the code also parses string values, although it is unlikely that font properties would include escaped whitespace characters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I admit I still haven't followedall the logic, but we can always check again later :)
With this I can produce smaller pdf files with usetex in some smalltests, but this obviously needs more extensive testing, thus markingas draft.Give dviread.DviFont a fake filename attribute for character tracking.On top ofmatplotlib#20715.Closesmatplotlib#127.
With this I can produce smaller pdf files with usetex in some smalltests, but this obviously needs more extensive testing, thus markingas draft.Give dviread.DviFont a fake filename attribute for character tracking.On top ofmatplotlib#20715.Closesmatplotlib#127.
depth += 1 | ||
elif match.group() == ')': | ||
depth -= 1 | ||
else: # a backslash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I guess that here you don't really care about handling the backslash escapes, and all you want to do is simply to match the (unescaped) parentheses, so you could perhaps just replace instring_re by something like(?<!\\)[()]
(parentheses not preceded by a backslash)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Ah, I gave this a try but it wouldn't work because the parenthesiscan be preceded by backslashes if the backslash is itself escaped (i.e. one would really need to search for "parenthesis not preceded by an even number of backslashes"). So let's forget about this for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'll merge based on@anntzer review. This shouldnot go into 3.5 so that it has time to be used on master...
Improve Type-1 font parsing
With this I can produce smaller pdf files with usetex in some smalltests, but this obviously needs more extensive testing, thus markingas draft.Give dviread.DviFont a fake filename attribute for character tracking.On top ofmatplotlib#20715.Closesmatplotlib#127.
Uh oh!
There was an error while loading.Please reload this page.
PR Summary
Parse font properties also from the encrypted part of the file, and reimplement the parsing so it understands more of PostScript's syntax. This fixes a bug where
Type1Font.transform
would not remove the UniqueID key but break some PostScript code referring to UniqueID instead.Incidentally, fix the bug where every font had a
weight
property with value'Normal'
- the correct property is spelledWeight
with a capital letter.This is a prerequisite for subsetting Type-1 fonts (#127).
PR Checklist
pytest
passes).flake8
on changed files to check).flake8-docstrings
and runflake8 --docstring-convention=all
).doc/users/next_whats_new/
(follow instructions in README.rst there).doc/api/next_api_changes/
(follow instructions in README.rst there).