- Notifications
You must be signed in to change notification settings - Fork33.8k
Description
Does this issue occur when all extensions are disabled?: Yes
- VS Code Version: 1.100.3
- OS Version: macOS 15.5
Steps to Reproduce:
- Open the integrated VSCode terminal;
- Enter (e.g.)
echo 'a\U01f6d6z'
orecho 'a\U01fae0z'
; - Notice that the emoji only advances 1 cell, but occupies 2 cells, causing it to overlap the following
z
character.
Most emoji are recognised as 2 cells, but it seems many are not, including several which have already been around for many versions of Unicode. Yet these are all already recognised by V8's RegExp with\p{Basic_Emoji}
.
A similar issue also applies to Ideographic characters (matching\p{Ideographic}
) which VSCode does not recognise (e.g.echo 'a\U016fe4z'
) - they are rendered as a single cell replacement character and advance 1 cell, but officially should all be wide and occupy 2 cells.
There are also issues with newer non-spacing combining characters (matching\p{Mn}
) such asecho 'a\u0897z'
. Again these are not recognised and render as a single cell replacement character with 1 cell advance, but officially should have an advance of 0.
In total I have found 809 characters in these groups which appear to have incorrect advance widths. All of these characters use the correct advance width when using the macOS built-in terminal program (except this character is classed as both non-spacing markand ideographic, so non-spacing mark takes priority).\u{016fe4}
for some reason, which has an advance width of 0 despite being classified as ideographic
These inconsistencies cause issues when writing console applications, where it can be important to know the advance width which will be applied to a character to know if line wrapping will occur.
Ideally VSCode would use the\p{Basic_Emoji}
,\p{Ideographic}
, and\p{Mn}
character properties to determine advance widths of characters to ensure it stays somewhat up-to-date with the latest Unicode definitions. It may also be necessary to review the advance widths of characters outside these ranges, though they are less trivial to give definitive "correct" values for.