Add support for\check#7738 and the brief forms inhttps://en.wikibooks.org/wiki/LaTeX/Special_Characters (double acute is new, the others just use the standard single-letter names).

In addition, replaces a character + combining accent with a single character once available as mentioned in#4561 (comment) This means that e.g.\" i now works and is properly replaced withï.

Check how this works withcmr10
Currently it is not checking if the combined single character exists in the font, no idea how to do that efficiently (maybe add an kwarg and/or rcparam so that this can be turned off)?
Add tests
Add release note

PR Checklist

Tests and Styling

Has pytest style unit tests (andpytest passes).
IsFlake 8 compliant (installflake8-docstrings and runflake8 --docstring-convention=all).

Documentation

New features are documented, with examples if plot related.
New features have an entry indoc/users/next_whats_new/ (follow instructions in README.rst there).
API changes documented indoc/api/next_api_changes/ (follow instructions in README.rst there).
Documentation is sphinx and numpydoc compliant (the docs shouldbuild without error).

oscargus added the topic: text/mathtext label

Jun 2, 2022

oscargus commented

Jun 2, 2022

View reviewed changes

lib/matplotlib/_mathtext.py OutdatedShow resolvedHide resolved

oscargus commented

Jun 2, 2022

View reviewed changes

lib/matplotlib/_mathtext.py OutdatedShow resolvedHide resolved

Copy link

MemberAuthor

oscargus commentedJun 2, 2022

The remaining errors after removing the single letter cases above (keeping H) are:

Now:

Earlier:

so a consequence of the actual characters being used.

Now:

Earlier:

The addition ofcheck leads to that thecheckmark is used here. I do not really understand this test, nor the use of_accentprefixed.

Now:

Earlier:

So a consequence of the combined character not being in the font.

For the first, and I assume the second, case, the right thing would be to update the images.

For the final case, there should be some checking if the glyph exists in the used font.

oscargus force-pushed themoreaccentsabove branch 2 times, most recently from897487e todf3add6Compare

June 6, 2022 12:32

Copy link

Contributor

anntzer commentedJun 11, 2022

accentprefixed is being handled (removed) at#22950.

Copy link

MemberAuthor

oscargus commentedJun 12, 2022

@anntzer Do you know if#22950 will enable using single character accents (that is also a starting character of another LaTeX symbol)?

Also, do you have any idea how one can detect if a glyph actually exists as in the ṡ turning into ¤ in the image above? (I do not think it is Matplotlib that does that substitution?)

Copy link

Contributor

oscargus commentedJun 12, 2022

Thanks! Ahh, I knew I had seen that somewhere! Grepped for ¤ though...

oscargus force-pushed themoreaccentsabove branch fromdf3add6 to2529261Compare

June 12, 2022 11:18

Copy link

MemberAuthor

oscargus commentedJun 12, 2022•
edited
Loading

I'm wondering if one should introduce some rcParam for the replacement. If I understand it correctly, it may not be possible for the parser to actually know the exact font being used? (Only like 'rm')

Edit: Inkscape was not in the path due to a reinstall...

Also, it seems like the svg output actually handles ṡ, but not the pdf or png output. Checking the source, it seems like something converts the combined character back into a combined accent and character. Not sure what though.

Anyway, I am wondering if one possible should try and decompose the characters once the _get_glyph-operation fails?

Example: (not relevant anymore, but may still be of interest)

importunicodedataaccent=chr(775)withcombiningaccent='s'+chr(775)print(withcombiningaccent ,len(withcombiningaccent))combined=unicodedata.normalize('NFC',withcombiningaccent)print(combined,len(combined))print(ord(combined))

This shows that it correctly findshttps://www.codetable.net/decimal/7777

One can dounicodedata.normalize('NFD', chr(7777)) to get the two characters back again.

~~However, in the svg output~~

Copy link

MemberAuthor

oscargus commentedJun 12, 2022

I also replaced some of the accents with the "proper" combining accent. So this breaks another test. But avoids having to resize\circ.

oscargus commented

Jun 12, 2022

View reviewed changes

lib/matplotlib/_mathtext_data.py Outdated

		@@ -999,9 +999,14 @@
		'combiningdiaeresis' : 776,
		'combiningtilde' : 771,
		'combiningrightarrowabove' : 8407,
		'combiningleftarrowabove' : 8406,

Copy link

MemberAuthor

oscargusJun 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

A bit of aligning required here and a few lines down.

Copy link

Contributor

anntzer commentedJun 12, 2022

Perhaps split out the addition of new accents as a separate PR, which should be fairly uncontroversial?

I suspect that general handling of combining characters would basically require harfbuzz (which knows how to position an accent by itself, e.g. the classic "zalgo" text h̷̡̦͚́͛̅̔̅̊͘ě̶͚̣̭́̉͜ļ̴͚͙̝̑̒l̸̛̙̹ͅơ̵͎̻͔̯̊ ̶̨̨͖̥̺͓̽̋̒͝w̶̨̗̻̥̜͍̮̏͛͒͝o̷̟͆̍̓̚ŗ̵̢͔̦̑͗̑̑̃l̸̲̥̲̹͖̔̇̾̏͆d̴͍̲̓̄̑̉̌̇͜) + switching from bakoma to lm-math, to have access to the combining characters... Still,

If I understand it correctly, it may not be possible for the parser to actually know the exact font being used?

Ithink that's actually possible? e.g.Char._update_metrics doesself._metrics = self.font_output.get_metrics(self.font, self.font_class, self.c, self.fontsize, self.dpi) loads the metrics of a glyph in the current concrete font, so that can certainly check whether the glyph exists in that font.

oscargus force-pushed themoreaccentsabove branch 2 times, most recently fromce1fa41 toe671b41Compare

June 12, 2022 14:38

oscargus added2 commits

June 12, 2022 18:31

Add support for more accents in mathtext

9bf6e87

Add new reference images

9971ce1

oscargus force-pushed themoreaccentsabove branch frome671b41 to9971ce1Compare

June 12, 2022 16:33

Copy link

MemberAuthor

oscargus commentedJun 12, 2022

You are correct that it was possible. I couldn't follow the order of things happening properly.

I think that the zalgo support is actually not that much affected by this. It is just that when there are proper glyphs available these will be used, if not, it will be as before (which I guess supported zalgo to some extent). See for example the test withr'$\mathring{A} \AA$', where now both characters render identical (Å). (This is a rather good test for this feature, possibly including a few more Unicode characters.)

One may even consider consider checking if a Unicode character can be split.

Anyway, this should really wait until#22950 is merged so that more accents can be added. One could also consider adding support for other combining accents, like cedilla and ogonek, which at least should work when there are available combined characters. Maybe one should have two separate groups of accents: the current ones where it is possible to "create" decently looking combinations and those like cedilla and ogonek which may have a valid combined glyph. If those doesn't work one could error if they do not combine or the glyph is not available.

(I tried out to get combining accents below working, but I had some issues with aligning them correctly, especially since cedilla and ogonek should be without a gap and I didn't get that to work for e.g. p, which probably noone wants, but still...)

There are now some more things changed:

macron and overline are different
if possible, a dotless i is used (as LaTeX does nowdays)
there a number of new test images, primarily for illustration, as I expect them to change (note that\check is not working)

oscargus commented

Jun 12, 2022

View reviewed changes

lib/matplotlib/_mathtext.py

		@@ -2050,10 +2060,27 @@ def accent(self, s, loc, toks):
		accent_box = AutoWidthChar(
		'\\' + accent, sym.width, state, char_class=Accent)
		else:
		# Check if accent and character can be combined

Copy link

MemberAuthor

oscargusJun 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

One can possibly consider splitting the accents into those that may have precomposed characters and those that may not.
https://en.wikipedia.org/wiki/List_of_precomposed_Latin_characters_in_Unicode

Possibly one should check that the character is one of the standard latin characters as well, although that may lead to that those precomposed with two accents may not work (which should be checked if they even do to start with...).