Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32k
gh-131507: Refactor screen and cursor position calculations#131547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
offset = len(character_widths) - character_widths.count(0) | ||
in_wrapped_line = prompt_len + sum(character_widths) >= self.console.width | ||
prompt_len, char_widths = self.screeninfo[i] | ||
offset = len(char_widths) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Isn't this breaking 0-width characters (like combining marks)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
No, this isfixing 0-width characters.
Edit: to explain more clearly. The code I removedlooks like it was meant to help with 0-width characters, but the code was actually the wrong way round. The reason why was that originally the "0-width" was historically used in pyrepl to show characters like ^C as two-wide by returning [1, 0] fromdisp_str()
. You can still see that in the docstring I edited. But either we made a change to undo that when originally merging pyrepl, or they changed it over the course of history so it didn't work anyway. So those 0-counts weren't doing anything productive anyway, but were only adding to our confusion. As proof, all zero-width characters are currently printed out as part of the Other unicodedata group:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
To be clear, the screenshot represents the state before this PR, and this PR is not affecting it in any way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
The way I discovered all this was that I originally wanted disp_str() to emit syntax highlighting as separate zero-width characters and then it turned out that the removal of the.count(0)
was necessary for the cursor positioning to work correctly. And even that was insufficient, since having zero-width characters in the stream was causing wrapping on narrow terminals to fall in infinite loops, there's an open issue about this (#126685).
This PR fixes#126685.
@@ -560,29 +530,33 @@ def setpos_from_xy(self, x: int, y: int) -> None: | |||
def pos2xy(self) -> tuple[int, int]: | |||
"""Return the x, y coordinates of position 'pos'.""" | |||
# this *is* incomprehensible, yes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
😆
def disp_str(buffer: str) -> tuple[CharBuffer, CharWidths]: | ||
r"""Decompose the input buffer into a printable variant. | ||
Returns a tuple of two lists: | ||
- the first list is the input buffer, character by character; | ||
- the second list is the visible width of each character in the input | ||
buffer. | ||
Examples: | ||
>>> utils.disp_str("a = 9") | ||
(['a', ' ', '=', ' ', '9'], [1, 1, 1, 1, 1]) | ||
""" | ||
chars: CharBuffer = [] | ||
char_widths: CharWidths = [] | ||
if not buffer: | ||
return chars, char_widths | ||
for c in buffer: | ||
if c == "\x1a": # CTRL-Z on Windows | ||
chars.append(c) | ||
char_widths.append(2) | ||
elif ord(c) < 128: | ||
chars.append(c) | ||
char_widths.append(1) | ||
elif unicodedata.category(c).startswith("C"): | ||
c = r"\u%04x" % ord(c) | ||
chars.append(c) | ||
char_widths.append(len(c)) | ||
else: | ||
chars.append(c) | ||
char_widths.append(str_width(c)) | ||
trace("disp_str({buffer}) = {s}, {b}", buffer=repr(buffer), s=chars, b=char_widths) | ||
return chars, char_widths |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Check if you like something of this version with some improvements:
defdisp_str(buffer:str)->tuple[CharBuffer,CharWidths]: | |
r"""Decomposetheinputbufferintoaprintablevariant. | |
Returnsatupleoftwolists: | |
-thefirstlististheinputbuffer,characterbycharacter; | |
-thesecondlististhevisiblewidthofeachcharacterintheinput | |
buffer. | |
Examples: | |
>>>utils.disp_str("a = 9") | |
(['a',' ','=',' ','9'], [1,1,1,1,1]) | |
""" | |
chars:CharBuffer= [] | |
char_widths:CharWidths= [] | |
ifnotbuffer: | |
returnchars,char_widths | |
forcinbuffer: | |
ifc=="\x1a":# CTRL-Z on Windows | |
chars.append(c) | |
char_widths.append(2) | |
eliford(c)<128: | |
chars.append(c) | |
char_widths.append(1) | |
elifunicodedata.category(c).startswith("C"): | |
c=r"\u%04x"%ord(c) | |
chars.append(c) | |
char_widths.append(len(c)) | |
else: | |
chars.append(c) | |
char_widths.append(str_width(c)) | |
trace("disp_str({buffer}) = {s}, {b}",buffer=repr(buffer),s=chars,b=char_widths) | |
returnchars,char_widths | |
CharBuffer=List[str] | |
CharWidths=List[int] | |
CTRL_Z="\x1a" | |
ASCII_MAX=127 | |
CTRL_Z_WIDTH=2 | |
ASCII_WIDTH=1 | |
@lru_cache(maxsize=1024) | |
defcached_category(c:str)->str: | |
"""Cache unicodedata.category calls for better performance.""" | |
returnunicodedata.category(c) | |
defdisp_str(buffer:str)->Tuple[CharBuffer,CharWidths]: | |
r"""Decomposetheinputbufferintoaprintablevariant. | |
Returnsatupleoftwolists: | |
-thefirstlististheinputbuffer,characterbycharacter; | |
-thesecondlististhevisiblewidthofeachcharacterintheinputbuffer. | |
Examples: | |
>>>utils.disp_str("a = 9") | |
(['a',' ','=',' ','9'], [1,1,1,1,1]) | |
""" | |
# Fast path for empty buffer | |
ifnotbuffer: | |
return [], [] | |
# Optimization for common case: pure ASCII text | |
ifall(ord(c)<ASCII_MAXforcinbuffer): | |
chars=list(buffer) | |
char_widths= [ASCII_WIDTH]*len(buffer) | |
trace("disp_str({buffer}) = {s}, {b}",buffer=repr(buffer),s=chars,b=char_widths) | |
returnchars,char_widths | |
chars:CharBuffer= [] | |
char_widths:CharWidths= [] | |
# Pre-allocate lists for better performance | |
chars= [None]*len(buffer) | |
char_widths= [0]*len(buffer) | |
fori,cinenumerate(buffer): | |
ifc==CTRL_Z:# CTRL-Z on Windows | |
chars[i]=c | |
char_widths[i]=CTRL_Z_WIDTH | |
eliford(c)<ASCII_MAX: | |
chars[i]=c | |
char_widths[i]=ASCII_WIDTH | |
elifcached_category(c).startswith("C"): | |
unicode_repr=f"\\u{ord(c):04x}" | |
chars[i]=unicode_repr | |
char_widths[i]=len(unicode_repr) | |
else: | |
chars[i]=c | |
try: | |
char_widths[i]=str_width(c) | |
exceptException: | |
# Fallback if str_width fails | |
char_widths[i]=1 | |
trace("disp_str({buffer}) = {s}, {b}",buffer=repr(buffer),s=chars,b=char_widths) | |
returnchars,char_widths |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I like caching character categories, will add this to the third PR in the sequence. The other optimization you added would have to be removed anyway when adding color codes.
ifnot in_wrapped_line: | ||
offset +=1 #there's a newline in buffer | ||
pos -=offset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I don't see why these two are equivalent :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This is equivalent, but it's subtle. Note that even in the old version, theif
test in line 580 is the same as what's under the variablein_wrapped_line
.
In the new version, we're reusing theoffset
variable that already had -1 appliedif in_wrapped_line
. So we only need to handle the newline case, which we do.
There's less computation in the new version, so it reads better to me. Italmost could be shortened further, but the break check is awkwardly in the middle there, so I kept it for equivalence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM with some comments
4cc82ff
intopython:mainUh oh!
There was an error while loading.Please reload this page.
Thanks@ambv for the PR 🌮🎉.. I'm working now to backport this PR to: 3.13. |
…thonGH-131547)This is based offpythonGH-131509.(cherry picked from commit4cc82ff)Co-authored-by: Łukasz Langa <lukasz@langa.pl>
GH-131557 is a backport of this pull request to the3.13 branch. |
Uh oh!
There was an error while loading.Please reload this page.
This is based off#131509.
I changed a bunch of variable names to make things more obvious in
calc_screen()
andpos2xy()
. Plus very mild refactoring once the better names make it obvious which things are the same. It is helpful to me at least.The one actual change here is moving
disp_str()
to_pyrepl.utils
because this is where syntax highlighting will live as well. With that move comes a slight performance optimization that will become functionally important later:disp_str()
no longer repacks the list of characters into a string (that is later only iterated on anyway incalc_screen()
). Additionally, our version ofdisp_str()
never had the behavior presented in the docstring, so I replaced it with something more sensible.That's about it. Tests prove no functional change.