Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork1.2k
Allow multi-character symbols#6489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Conversation
No clue how to fix the failures in I guess for the IDE, As for the docs, this is probably a way more substantial change, given how the current docs store the codepoint as a |
mkorje commentedJun 23, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
We can now just call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Maybe there should be a new Repr for runtime-single-character symbols instead of just allocating them like I've done here?
It would indeed be nice if those cases wouldn't allocate.
I guess for the IDE, CompletionKind::Symbol can become Symbol(EcoString), but I think that field is only used in proprietary code, so someone from the Team would have to say whether that's a worthwhile change.
It's actually not even used in proprietary code atm, but it used to be and might be in the future. But either wayEcoString
is okay.
As for the docs, this is probably a way more substantial change, given how the current docs store the codepoint as a u32, so I also don't think I can fix this by myself.
The non-codepoint stuff should be fairly straightforward (i.e. Accent and math class just become None). Thecodepoint
can be replaced by avalue
string field. I'll then adjust the proprietary part of the docs generator accordingly.
cast! { | ||
SymbolVariant, | ||
c: char => Self(EcoString::new(), c), | ||
c: char => Self(EcoString::new(), c.into()), | ||
s: EcoString => Self(EcoString::new(), s), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Not sure whether we made a decision on whether we want to only accept a single grapheme cluster, but worth bringing up again.
Uh oh!
There was an error while loading.Please reload this page.
Good point, let's bring that discussion to this thread. I'm personally against it (as in, I think arbitrary strings should be allowed) and I don't really remember the arguments for why this would be necessary (So if anyone else has any, please introduce or repeat them here). The only one I can think of is |
Using "codepoint" is more accurate and lines up with what typst's standard library uses
Another technical detail: I don't know if |
I changed the symbol repr since I think the previous behavior was probably unintended (such as Other than that, I made some minor changes to no longer use the variable name |
T0mstone commentedJul 10, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
After thinking about this a bit and trying some stuff, I found that this isn't really possible since Footnotes
|
With this, the tests are now passing, but And secondly, I should mention that the code as it is now permits empty strings as symbol values, which might cause some other issues. We probably just want to forbid those, right? (In which case I think only |
math_class: value_char.and_then(|c|{ | ||
typst_utils::default_math_class(c).map(math_class_name) | ||
}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
We might want to be more subtle than that here. Ifvalue
is the string"⤴\u{FE0E}"
(U+2934, which is a Relation, followed by a text presentation selector), we want the math class to be that of the base character of the variation sequence.
Probably same for theaccent
below.
I think something like this should work:diff --git a/crates/typst-layout/src/math/fragment.rs b/crates/typst-layout/src/math/fragment.rsindex 758dd401..53aaaa4f 100644--- a/crates/typst-layout/src/math/fragment.rs+++ b/crates/typst-layout/src/math/fragment.rs@@ -15,6 +15,7 @@ use typst_library::text::{features, language, Font, Glyph, TextElem, TextItem}; use typst_syntax::Span; use typst_utils::{default_math_class, Get}; use unicode_math_class::MathClass;+use unicode_segmentation::UnicodeSegmentation; use super::MathContext; use crate::inline::create_shape_plan;@@ -278,6 +279,8 @@ impl GlyphFragment { text: &str, span: Span, ) -> SourceResult<GlyphFragment> {+ assert!(text.graphemes(true).count() == 1);+ let mut buffer = UnicodeBuffer::new(); buffer.push_str(text); buffer.set_language(language(styles));@@ -300,6 +303,7 @@ impl GlyphFragment { ); let buffer = rustybuzz::shape_with_plan(font.rusty(), &plan, buffer);+ // TODO: deal with multiple glyphs. if buffer.len() != 1 { bail!(span, "did not get a single glyph after shaping {}", text); }diff --git a/crates/typst-layout/src/math/text.rs b/crates/typst-layout/src/math/text.rsindex e76fb2e7..4e819144 100644--- a/crates/typst-layout/src/math/text.rs+++ b/crates/typst-layout/src/math/text.rs@@ -129,44 +129,48 @@ pub fn layout_symbol( ctx: &mut MathContext, styles: StyleChain, ) -> SourceResult<()> {- assert!(- elem.text.len() <= 4 && elem.text.chars().count() == 1,- "TODO: layout multi-char symbol"- );- let elem_char = elem- .text- .chars()- .next()- .expect("TODO: should an empty symbol value forbidden?");-- // Switch dotless char to normal when we have the dtls OpenType feature.- // This should happen before the main styling pass.- let dtls = style_dtls();- let (unstyled_c, symbol_styles) = match try_dotless(elem_char) {- Some(c) if has_dtls_feat(ctx.font) => (c, styles.chain(&dtls)),- _ => (elem_char, styles),- };- let variant = styles.get(EquationElem::variant); let bold = styles.get(EquationElem::bold); let italic = styles.get(EquationElem::italic);+ let dtls = style_dtls();+ let has_dtls_feat = has_dtls_feat(ctx.font);+ for text in elem.text.graphemes(true) {+ // Switch dotless char to normal when we have the dtls OpenType feature.+ // This should happen before the main styling pass.+ let (text, symbol_styles) =+ if has_dtls_feat && text.chars().any(|c| try_dotless(c).is_some()) {+ (+ text.chars()+ .map(|c| try_dotless(c).unwrap_or(c))+ .collect::<EcoString>(),+ styles.chain(&dtls),+ )+ } else {+ (text.into(), styles)+ };++ let text: EcoString = text+ .chars()+ .flat_map(|c| {+ let style = MathStyle::select(c, variant, bold, italic);+ to_style(c, style)+ })+ .collect();- let style = MathStyle::select(unstyled_c, variant, bold, italic);- let text: EcoString = to_style(unstyled_c, style).collect();-- let fragment: MathFragment =- match GlyphFragment::new(ctx.font, symbol_styles, &text, elem.span()) {- Ok(mut glyph) => {- adjust_glyph_layout(&mut glyph, ctx, styles);- glyph.into()- }- Err(_) => {- // Not in the math font, fallback to normal inline text layout.- // TODO: Should replace this with proper fallback in [`GlyphFragment::new`].- layout_inline_text(&text, elem.span(), ctx, styles)?.into()- }- };- ctx.push(fragment);+ let fragment: MathFragment =+ match GlyphFragment::new(ctx.font, symbol_styles, &text, elem.span()) {+ Ok(mut glyph) => {+ adjust_glyph_layout(&mut glyph, ctx, styles);+ glyph.into()+ }+ Err(_) => {+ // Not in the math font, fallback to normal inline text layout.+ // TODO: Should replace this with proper fallback in [`GlyphFragment::new`].+ layout_inline_text(&text, elem.span(), ctx, styles)?.into()+ }+ };+ ctx.push(fragment);+ } Ok(()) } (The code is a bit ugly, sorry about that)
Regarding this if the fallback happens during shaping in math (which it basically always will), the above code just lets it error out for the time being with the "didn't get a single glyph after shaping error", after which it'll fallback anyways to the inline text layout. Another thing: I thought that the text |
Uh oh!
There was an error while loading.Please reload this page.
This is the companion PR tocodex#92.
Note that this is blocked on (and already rebased onto)#6441 and also blocked oncodex#20 as explained below.A bunch of things expected symbols to only have one character.
I mostly changed those to add a new error when the symbol has more than one character. This is technically non-breaking, since multi-character symbols were also an error before, but the error now moved from at their creation to at their use in those cases.
Other technical notes:
Repr
for runtime-single-character symbols instead of just allocating them like I've done here?layout_symbol
... I'm not really familiar with the layout code, and this change just made it work, but I think a single symbol should maybe be a single fragment? Either way, this probably interacts withcodex#20 and any typst-side integration of that (idk if there's already a PR for that), so it's probably best to wait on that before merging this.