Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Add support for more accents in mathtext#23189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
oscargus wants to merge2 commits intomatplotlib:main
base:main
Choose a base branch
Loading
fromoscargus:moreaccentsabove

Conversation

oscargus
Copy link
Member

PR Summary

Add support for\check#7738 and the brief forms inhttps://en.wikibooks.org/wiki/LaTeX/Special_Characters (double acute is new, the others just use the standard single-letter names).

In addition, replaces a character + combining accent with a single character once available as mentioned in#4561 (comment) This means that e.g.\" i now works and is properly replaced withï.

  • Check how this works withcmr10
  • Currently it is not checking if the combined single character exists in the font, no idea how to do that efficiently (maybe add an kwarg and/or rcparam so that this can be turned off)?
  • Add tests
  • Add release note

PR Checklist

Tests and Styling

  • Has pytest style unit tests (andpytest passes).
  • IsFlake 8 compliant (installflake8-docstrings and runflake8 --docstring-convention=all).

Documentation

  • New features are documented, with examples if plot related.
  • New features have an entry indoc/users/next_whats_new/ (follow instructions in README.rst there).
  • API changes documented indoc/api/next_api_changes/ (follow instructions in README.rst there).
  • Documentation is sphinx and numpydoc compliant (the docs shouldbuild without error).

@oscargus
Copy link
MemberAuthor

The remaining errors after removing the single letter cases above (keeping H) are:

Now:
mathtext_cm_26

Earlier:
mathtext_cm_26-expected

so a consequence of the actual characters being used.

Now:
mathtext_cm_77

Earlier:
mathtext_cm_77-expected

The addition ofcheck leads to that thecheckmark is used here. I do not really understand this test, nor the use of_accentprefixed.

Now:
mathtext_cm_00

Earlier:
mathtext_cm_00-expected

So a consequence of the combined character not being in the font.

For the first, and I assume the second, case, the right thing would be to update the images.

For the final case, there should be some checking if the glyph exists in the used font.

@oscargusoscargusforce-pushed themoreaccentsabove branch 2 times, most recently from897487e todf3add6CompareJune 6, 2022 12:32
@anntzer
Copy link
Contributor

accentprefixed is being handled (removed) at#22950.

oscargus reacted with hooray emoji

@oscargus
Copy link
MemberAuthor

@anntzer Do you know if#22950 will enable using single character accents (that is also a starting character of another LaTeX symbol)?

Also, do you have any idea how one can detect if a glyph actually exists as in the ṡ turning into ¤ in the image above? (I do not think it is Matplotlib that does that substitution?)

@anntzer
Copy link
Contributor

first point: yes, I think that should work.
second point: I think the relevant parts are around

glyphindex=font.get_char_index(uniindex)
ifglyphindex!=0:
found_symbol=True

oscargus reacted with thumbs up emoji

@oscargus
Copy link
MemberAuthor

Thanks! Ahh, I knew I had seen that somewhere! Grepped for ¤ though...

@oscargus
Copy link
MemberAuthor

oscargus commentedJun 12, 2022
edited
Loading

I'm wondering if one should introduce some rcParam for the replacement. If I understand it correctly, it may not be possible for the parser to actually know the exact font being used? (Only like 'rm')

Edit: Inkscape was not in the path due to a reinstall...

Also, it seems like the svg output actually handles ṡ, but not the pdf or png output. Checking the source, it seems like something converts the combined character back into a combined accent and character. Not sure what though.

Anyway, I am wondering if one possible should try and decompose the characters once the _get_glyph-operation fails?

Example: (not relevant anymore, but may still be of interest)

importunicodedataaccent=chr(775)withcombiningaccent='s'+chr(775)print(withcombiningaccent ,len(withcombiningaccent))combined=unicodedata.normalize('NFC',withcombiningaccent)print(combined,len(combined))print(ord(combined))

This shows that it correctly findshttps://www.codetable.net/decimal/7777

One can dounicodedata.normalize('NFD', chr(7777)) to get the two characters back again.

However, in the svg output

@oscargus
Copy link
MemberAuthor

I also replaced some of the accents with the "proper" combining accent. So this breaks another test. But avoids having to resize\circ.

@@ -999,9 +999,14 @@
'combiningdiaeresis' : 776,
'combiningtilde' : 771,
'combiningrightarrowabove' : 8407,
'combiningleftarrowabove' : 8406,
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

A bit of aligning required here and a few lines down.

@anntzer
Copy link
Contributor

Perhaps split out the addition of new accents as a separate PR, which should be fairly uncontroversial?

I suspect that general handling of combining characters would basically require harfbuzz (which knows how to position an accent by itself, e.g. the classic "zalgo" text h̷̡̦͚́͛̅̔̅̊͘ě̶͚̣̭́̉͜ļ̴͚͙̝̑̒l̸̛̙̹ͅơ̵͎̻͔̯̊ ̶̨̨͖̥̺͓̽̋̒͝w̶̨̗̻̥̜͍̮̏͛͒͝o̷̟͆̍̓̚ŗ̵̢͔̦̑͗̑̑̃l̸̲̥̲̹͖̔̇̾̏͆d̴͍̲̓̄̑̉̌̇͜) + switching from bakoma to lm-math, to have access to the combining characters... Still,

If I understand it correctly, it may not be possible for the parser to actually know the exact font being used?

Ithink that's actually possible? e.g.Char._update_metrics doesself._metrics = self.font_output.get_metrics(self.font, self.font_class, self.c, self.fontsize, self.dpi) loads the metrics of a glyph in the current concrete font, so that can certainly check whether the glyph exists in that font.

@oscargusoscargusforce-pushed themoreaccentsabove branch 2 times, most recently fromce1fa41 toe671b41CompareJune 12, 2022 14:38
@oscargus
Copy link
MemberAuthor

You are correct that it was possible. I couldn't follow the order of things happening properly.

I think that the zalgo support is actually not that much affected by this. It is just that when there are proper glyphs available these will be used, if not, it will be as before (which I guess supported zalgo to some extent). See for example the test withr'$\mathring{A} \AA$', where now both characters render identical (Å). (This is a rather good test for this feature, possibly including a few more Unicode characters.)

One may even consider consider checking if a Unicode character can be split.

Anyway, this should really wait until#22950 is merged so that more accents can be added. One could also consider adding support for other combining accents, like cedilla and ogonek, which at least should work when there are available combined characters. Maybe one should have two separate groups of accents: the current ones where it is possible to "create" decently looking combinations and those like cedilla and ogonek which may have a valid combined glyph. If those doesn't work one could error if they do not combine or the glyph is not available.

(I tried out to get combining accents below working, but I had some issues with aligning them correctly, especially since cedilla and ogonek should be without a gap and I didn't get that to work for e.g. p, which probably noone wants, but still...)

There are now some more things changed:

  • macron and overline are different
  • if possible, a dotless i is used (as LaTeX does nowdays)
  • there a number of new test images, primarily for illustration, as I expect them to change (note that\check is not working)

@@ -2050,10 +2060,27 @@ def accent(self, s, loc, toks):
accent_box = AutoWidthChar(
'\\' + accent, sym.width, state, char_class=Accent)
else:
# Check if accent and character can be combined
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

One can possibly consider splitting the accents into those that may have precomposed characters and those that may not.
https://en.wikipedia.org/wiki/List_of_precomposed_Latin_characters_in_Unicode

Possibly one should check that the character is one of the standard latin characters as well, although that may lead to that those precomposed with two accents may not work (which should be checked if they even do to start with...).

@oscargus
Copy link
MemberAuthor

Turns out that for some characters caron (\check) is written like thathttps://www.compart.com/en/unicode/U+0165

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers
No reviews
Assignees
No one assigned
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

2 participants
@oscargus@anntzer

[8]ページ先頭

©2009-2025 Movatter.jp