Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Allow multi-character symbols#6489

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
T0mstone wants to merge6 commits intotypst:main
base:main
Choose a base branch
Loading
fromT0mstone:multi-char-symbols

Conversation

T0mstone
Copy link
Contributor

@T0mstoneT0mstone commentedJun 23, 2025
edited
Loading

This is the companion PR tocodex#92.

Note that this is blocked on (and already rebased onto)#6441 and also blocked oncodex#20 as explained below.

A bunch of things expected symbols to only have one character.
I mostly changed those to add a new error when the symbol has more than one character. This is technically non-breaking, since multi-character symbols were also an error before, but the error now moved from at their creation to at their use in those cases.

Other technical notes:

  • Maybe there should be a newRepr for runtime-single-character symbols instead of just allocating them like I've done here?
  • I'm REALLY unsure aboutlayout_symbol... I'm not really familiar with the layout code, and this change just made it work, but I think a single symbol should maybe be a single fragment? Either way, this probably interacts withcodex#20 and any typst-side integration of that (idk if there's already a PR for that), so it's probably best to wait on that before merging this.

@T0mstone
Copy link
ContributorAuthor

No clue how to fix the failures intypst-ide andtypst-docs.

I guess for the IDE,CompletionKind::Symbol can becomeSymbol(EcoString), but I think that field is only used in proprietary code, so someone from the Team would have to say whether that's a worthwhile change.

As for the docs, this is probably a way more substantial change, given how the current docs store the codepoint as au32, so I also don't think I can fix this by myself.

@mkorje
Copy link
Collaborator

mkorje commentedJun 23, 2025
edited
Loading

I'm REALLY unsure aboutlayout_symbol... I'm not really familiar with the layout code, and this change just made it work, but I think a single symbol should maybe be a single fragment? Either way, this probably interacts withcodex#20 and any typst-side integration of that (idk if there's already a PR for that), so it's probably best to wait on that before merging this.

We can now just callGlyphFragment::new with the whole string (#6336) (this function will error if after shaping the string we don't get a single glyph, i.e. if the string isn't a single grapheme cluster), and instead do something likeelem.text().chars().flat_map(|c| to_style(c, ...)) to style it.

Copy link
Member

@laurmaedjelaurmaedje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Maybe there should be a new Repr for runtime-single-character symbols instead of just allocating them like I've done here?

It would indeed be nice if those cases wouldn't allocate.

I guess for the IDE, CompletionKind::Symbol can become Symbol(EcoString), but I think that field is only used in proprietary code, so someone from the Team would have to say whether that's a worthwhile change.

It's actually not even used in proprietary code atm, but it used to be and might be in the future. But either wayEcoString is okay.

As for the docs, this is probably a way more substantial change, given how the current docs store the codepoint as a u32, so I also don't think I can fix this by myself.

The non-codepoint stuff should be fairly straightforward (i.e. Accent and math class just become None). Thecodepoint can be replaced by avalue string field. I'll then adjust the proprietary part of the docs generator accordingly.

T0mstone reacted with thumbs up emoji

cast! {
SymbolVariant,
c: char => Self(EcoString::new(), c),
c: char => Self(EcoString::new(), c.into()),
s: EcoString => Self(EcoString::new(), s),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Not sure whether we made a decision on whether we want to only accept a single grapheme cluster, but worth bringing up again.

@T0mstone
Copy link
ContributorAuthor

Not sure whether we made a decision on whether we want to only accept a single grapheme cluster, but worth bringing up again.

Good point, let's bring that discussion to this thread.

I'm personally against it (as in, I think arbitrary strings should be allowed) and I don't really remember the arguments for why this would be necessary (So if anyone else has any, please introduce or repeat them here). The only one I can think of islayout_symbol requiring a single glyph, but requiring a single grapheme cluster doesn't fix that either since e.g. emoji sequences can fall back to multiple glyphs if the font doesn't support them. For example, the "people holding hands" emoji (🧑‍🤝‍🧑) I added incodex#104 is a single grapheme cluster that can fall back to "🧑🤝🧑".

Using "codepoint" is more accurate and lines up with what typst's standard library uses
@T0mstone
Copy link
ContributorAuthor

Another technical detail: I don't know ifSymbolElem::func would be better by first checking if the string is a single codepoint and then matching on that. Maybe this is an irrelevant concern tho and then the decision could be made on principle (either by deciding that matching on strings makes it easier to add multi-char callables in the future, or by deciding that we'll never have multi-char callables anyway, so it'd be better to convert tochar first and then match on that).

@T0mstone
Copy link
ContributorAuthor

I changed the symbol repr since I think the previous behavior was probably unintended (such assymbol("\"") repr'ing assymbol(""")), so it now usesrepr again to format the strings (both for the modifier sets and for the values).

Other than that, I made some minor changes to no longer use the variable namec for things that are now strings and to changeoption::IntoIter into the more idiomaticiter::Once.

@T0mstone
Copy link
ContributorAuthor

T0mstone commentedJul 10, 2025
edited
Loading

It would indeed be nice if those cases wouldn't allocate.

After thinking about this a bit and trying some stuff, I found that this isn't really possible sinceSymbol::get returns&str, butchar is not UTF-8 encoded. We could usehttps://docs.rs/utf8char/0.2.0/utf8char/struct.Utf8Char.html or hand-roll something similar.1 Do you think that'd be worth it?

Footnotes

  1. One thing I tried was[u8; 4] combined withchar::encode_utf8 andstr::from_utf8, but the conversions were a bit excessive, so this would then probably be best as a wrapper type intypst-utils or something.

@T0mstone
Copy link
ContributorAuthor

With this, the tests are now passing, butlayout_symbol remains unable to handle multi-char values. As mentioned in#6489 (comment), I think it needs to be able to handle multiple glyphs regardless of whether we require symbols to be single grapheme clusters or not, but I don't know how to implement this.

And secondly, I should mention that the code as it is now permits empty strings as symbol values, which might cause some other issues. We probably just want to forbid those, right? (In which case I think onlySymbol::construct would need to be adjusted to reject them).

Comment on lines +736 to +738
math_class: value_char.and_then(|c|{
typst_utils::default_math_class(c).map(math_class_name)
}),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We might want to be more subtle than that here. Ifvalue is the string"⤴\u{FE0E}" (U+2934, which is a Relation, followed by a text presentation selector), we want the math class to be that of the base character of the variation sequence.

Probably same for theaccent below.

@mkorje
Copy link
Collaborator

With this, the tests are now passing, butlayout_symbol remains unable to handle multi-char values. As mentioned in#6489 (comment), I think it needs to be able to handle multiple glyphs regardless of whether we require symbols to be single grapheme clusters or not, but I don't know how to implement this.

I think something like this should work:
diff --git a/crates/typst-layout/src/math/fragment.rs b/crates/typst-layout/src/math/fragment.rsindex 758dd401..53aaaa4f 100644--- a/crates/typst-layout/src/math/fragment.rs+++ b/crates/typst-layout/src/math/fragment.rs@@ -15,6 +15,7 @@ use typst_library::text::{features, language, Font, Glyph, TextElem, TextItem}; use typst_syntax::Span; use typst_utils::{default_math_class, Get}; use unicode_math_class::MathClass;+use unicode_segmentation::UnicodeSegmentation;  use super::MathContext; use crate::inline::create_shape_plan;@@ -278,6 +279,8 @@ impl GlyphFragment {         text: &str,         span: Span,     ) -> SourceResult<GlyphFragment> {+        assert!(text.graphemes(true).count() == 1);+         let mut buffer = UnicodeBuffer::new();         buffer.push_str(text);         buffer.set_language(language(styles));@@ -300,6 +303,7 @@ impl GlyphFragment {         );          let buffer = rustybuzz::shape_with_plan(font.rusty(), &plan, buffer);+        // TODO: deal with multiple glyphs.         if buffer.len() != 1 {             bail!(span, "did not get a single glyph after shaping {}", text);         }diff --git a/crates/typst-layout/src/math/text.rs b/crates/typst-layout/src/math/text.rsindex e76fb2e7..4e819144 100644--- a/crates/typst-layout/src/math/text.rs+++ b/crates/typst-layout/src/math/text.rs@@ -129,44 +129,48 @@ pub fn layout_symbol(     ctx: &mut MathContext,     styles: StyleChain, ) -> SourceResult<()> {-    assert!(-        elem.text.len() <= 4 && elem.text.chars().count() == 1,-        "TODO: layout multi-char symbol"-    );-    let elem_char = elem-        .text-        .chars()-        .next()-        .expect("TODO: should an empty symbol value forbidden?");--    // Switch dotless char to normal when we have the dtls OpenType feature.-    // This should happen before the main styling pass.-    let dtls = style_dtls();-    let (unstyled_c, symbol_styles) = match try_dotless(elem_char) {-        Some(c) if has_dtls_feat(ctx.font) => (c, styles.chain(&dtls)),-        _ => (elem_char, styles),-    };-     let variant = styles.get(EquationElem::variant);     let bold = styles.get(EquationElem::bold);     let italic = styles.get(EquationElem::italic);+    let dtls = style_dtls();+    let has_dtls_feat = has_dtls_feat(ctx.font);+    for text in elem.text.graphemes(true) {+        // Switch dotless char to normal when we have the dtls OpenType feature.+        // This should happen before the main styling pass.+        let (text, symbol_styles) =+            if has_dtls_feat && text.chars().any(|c| try_dotless(c).is_some()) {+                (+                    text.chars()+                        .map(|c| try_dotless(c).unwrap_or(c))+                        .collect::<EcoString>(),+                    styles.chain(&dtls),+                )+            } else {+                (text.into(), styles)+            };++        let text: EcoString = text+            .chars()+            .flat_map(|c| {+                let style = MathStyle::select(c, variant, bold, italic);+                to_style(c, style)+            })+            .collect();-    let style = MathStyle::select(unstyled_c, variant, bold, italic);-    let text: EcoString = to_style(unstyled_c, style).collect();--    let fragment: MathFragment =-        match GlyphFragment::new(ctx.font, symbol_styles, &text, elem.span()) {-            Ok(mut glyph) => {-                adjust_glyph_layout(&mut glyph, ctx, styles);-                glyph.into()-            }-            Err(_) => {-                // Not in the math font, fallback to normal inline text layout.-                // TODO: Should replace this with proper fallback in [`GlyphFragment::new`].-                layout_inline_text(&text, elem.span(), ctx, styles)?.into()-            }-        };-    ctx.push(fragment);+        let fragment: MathFragment =+            match GlyphFragment::new(ctx.font, symbol_styles, &text, elem.span()) {+                Ok(mut glyph) => {+                    adjust_glyph_layout(&mut glyph, ctx, styles);+                    glyph.into()+                }+                Err(_) => {+                    // Not in the math font, fallback to normal inline text layout.+                    // TODO: Should replace this with proper fallback in [`GlyphFragment::new`].+                    layout_inline_text(&text, elem.span(), ctx, styles)?.into()+                }+            };+        ctx.push(fragment);+    }     Ok(()) }

(The code is a bit ugly, sorry about that)

the "people holding hands" emoji (🧑‍🤝‍🧑) I added incodex#104 is a single grapheme cluster that can fall back to "🧑🤝🧑".

Regarding this if the fallback happens during shaping in math (which it basically always will), the above code just lets it error out for the time being with the "didn't get a single glyph after shaping error", after which it'll fallback anyways to the inline text layout.

Another thing: I thought that the text0x0030 0xFE0E or0x0030 0xFE0F (can replace 0x0030 with any of the digits 0-9) could lead to problems as if it were styled, say bold, the codepoint would change from0x0030 and this would now be a non-standardised variation sequence. But this is a non-issue as the shaper will just ignore it.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@laurmaedjelaurmaedjelaurmaedje left review comments

@MDLC01MDLC01MDLC01 left review comments

Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

4 participants
@T0mstone@mkorje@laurmaedje@MDLC01

[8]ページ先頭

©2009-2025 Movatter.jp