aroberge (André Roberge)2 I very, very much like this proposal. However, I would suggest a small change. The PEP currently as written includes the following:
Maintaining the current behavior, only a single line will be displayed in tracebacks. For instructions that span multiple lines (the end offset and the start offset belong to different lines), the end offset will be set to 0 (meaning it is unavailable)
Other enhanced traceback packages [1] often display multiple lines surrounding the actual line where an error was located. Such packages might want to highlight any problem code spanning multiple lines. For such tools, it would be very useful to have the end offset NOT set to zero when it is on a different line. If the end line was recorded, CPython could simply use the fact that the end line was different from the beginning line the same way that “end offset == 0” is currently proposed; other packages could make use of the entire information as needed.
[1] An example of such a package isGitHub - aroberge/friendly: Aimed at Python beginners: replacing standard traceback by something easier to understand. In some cases, friendly already shows similar enhanced information about the location in tracebacks as shown on the picture below. A list of other enhanced traceback packages can be found at the bottom ofSome thoughts on the design of friendly — friendly-traceback 0.3.142 documentation
2 Likes
Thank you for bringing this up, we do mention in thereject ideas section that something like this ought to be useful for external tools but that it can’t really be taken advantage of by CPython itself without making big changes to the traceback machinery.
Long term this might be useful to add (I think line end numbers or deltas would also compress fairly well since most bytecodes will just be on the same line), but I think we would like to keep it out of scope for this PEP to keep the implementation simple and the overhead low.
Jelle (Jelle Zijlstra)4 I’ve definitely been bitten by this issue before (not knowing which part of a complex expression threw an error). I’m excited that it’s finally getting fixed!
I do have a piece of feedback. In this example from the PEP:
File "test.py", line 6, in lel return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e) ^^^^^^^^^^^^^^^^^^^^^TypeError: 'NoneType' object is not subscriptable
my first instinct was that the highlighted expressionx['z']['x']['y']['z'] was None. Thinking about it some more, I realized it actually means thatx['z']['x']['y'] is None, because the caret points at the piece of AST being evaluated, not the part that’s incorrectly None.
I’m worried that this will end up being a common user confusion. I don’t have a good suggestion for fixing it, though, so maybe it’s just something people will have to learn.
3 Likes
guido (Guido van Rossum)5 Maybe we should highlight the text most closely associated with theoperation that fails rather than with the failing subexpression? E.g. if we havef() + g() and the+ operation fails, the caret should only point at+. In Jelle’s example (which also bothered me when I first saw it) I would highlight['z'] only.
4 Likes
pablogsal (Pablo Galindo Salgado)6 That’s s good idea. I think the best approach is some hybrid thing where we highlight the range and some position inside it. Many people mentioned that a single caret can be hard to read, specially for people with impaired vision, which motivated this change.
We will explore that but one of the problems that I can see of that is that it requires more manual intervention in the compiler and has more room for error. On the other hand, the current approach is still complex but we can reuse a lot of machinery and get always correct ranges (as long as the line numbers are correct). Given how many inaccuracies the old manual AST positions have we are advocating for the less amount of manual work, at least for the first version of this.
pablogsal (Pablo Galindo Salgado)8 Thinking about it some more, I realized it actually means thatx['z']['x']['y'] is None, because the caret points at the piece of AST being evaluated, not the part that’s incorrectly None.
Indeed, we will think about it but fixing this is not trivial. Notice that the error is not saying “this is None” but that “None cannot be indexed”. Therefore the highlighted range is the indexed operation, not the part that is None.
Is the same as if you do
X=NoneX[7]
What is wrong isX[7] notX.
One of the important points if preserve the AST ranges is that if you get the code represented by the range, it will always be a valid syntactic structure, while in the original example['z'] is not (is actually valid as a list, but that would be the wrong one
).
In any case, we can fine tune this as part of the implementation, but this doesn’t really modify the proposal so I would prefer to discuss this later if the pep is accepted.
guido (Guido van Rossum)9 (Meta: can you use quoting instead of saying “this” or “that”? Messages may appear that make such references ambiguous.)
pablogsal (Pablo Galindo Salgado)10 (Meta: can you use quoting instead of saying “this” or “that”? Messages may appear that make such references ambiguous.)
I will do it in my next messages, but I’m writing this on my phone and that makes quoting very hard
1 Like
cjerdonek (Chris Jerdonek)12 The rationale says the data types were chosen in a way that tries to minimize the impact of the size of code objects in memory. However, it doesn’t say whether variable-length encodings were seriously considered. It seems likely to me that a variable-length encoding could permit offsets bigger than 255 to be represented without increasing memory over the current proposal for most projects. It also even seems possible that a more clever encoding could be chosen that uses less memory than the current proposal.
In the schemes I’m thinking of, the two offsets would be stored as(start_offset, length) across one or more bytes. In the simplest scheme, if the first bit is0, then the data for both offsets would be contained in two bytes: the next 8 bits would be the start offset, and the remaining 7 the length. That would let you represent start offsets up to 255 and lengths up to 127 in two bytes.
In a more clever scheme, the most common “rectangle” of(start_offset, length) pairs could be represented in one byte, pairs up to(127, 127) in two bytes (encompassing all PEP-8 compliant code), and larger pairs in three or more bytes. That could potentially be more memory efficient than the current proposal. However, it would depend on knowing more about the real-world distribution of(start_offset, length) pairs.
nas (Neil Schemenauer)14 I think this is a great idea and will have quite a big improvement for debugging. I think the small space cost (+22%?) is minor relative to the value of the feature. Putting the table in a separate section of the .pyc file seems like a nice optimization. If we don’t want to complicate the interpreter with extra options to exclude it, perhaps we could provide a tool that strips that table from .pyc files. Then, someone who is very space constrained could use that tool. It would be kind of similar to the code minifiers used by Javascript.
That misinterpretation bit me too.
It bit me so hard I had to read Jelle’s explanation three times before I
got it. So I agree that this will be a common user confusion.
Quoting is also hard from email. Discuss seems to strip email quotes
(lines beginning with one or ‘>’) and I haven’t yet worked out what sort
of markup to use in its place.