- Notifications
You must be signed in to change notification settings - Fork4.1k
Cleanup split between lexer and sliding-text-window#79205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Conversation
@@ -15,6 +13,7 @@ internal class AbstractLexer : IDisposable | |||
{ | |||
internal readonly SlidingTextWindow TextWindow; | |||
private List<SyntaxDiagnosticInfo>? _errors; | |||
protected int LexemeStartPosition; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
moved to lexer-layer from SlidingTextWindow
=> TextWindow.GetText(LexemeStartPosition, intern: false); | ||
protected string GetInternedLexemeText() | ||
=> TextWindow.GetText(LexemeStartPosition, intern: true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
these .GetText helpers on the TextWindow were used all over the place. MOved to two simple helpers on Lexer for getting either an interned or non-interned chunk of text for the current lexeme.
=> TextWindow.GetText(LexemeStartPosition, intern: true); | ||
protected int CurrentLexemeWidth | ||
=> this.TextWindow.Position - LexemeStartPosition; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
this was previously exposed as the (imo) confusingly named SlidingTextWindow.Width.
I tried to ensure that helpers related to lexemes used that term in their name to keep it clear what 'width/count/etc.' things are referring to.
internal SyntaxTrivia LookupWhitespaceTrivia( | ||
SlidingTextWindow textWindow, | ||
int lexemeStartPosition, | ||
int hashCode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
easier to just read as the new function. There was only ever one caller ofr LookupTrivia, and it always used the same generic arg.
@@ -87,7 +87,7 @@ private void ScanRawStringLiteral(ref TokenInfo info, bool inDirective) | |||
// trusting the contents. | |||
if (this.HasErrors) | |||
{ | |||
var afterStartDelimiter =TextWindow.LexemeStartPosition + startingQuoteCount; | |||
var afterStartDelimiter =this.LexemeStartPosition + startingQuoteCount; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Reference totextWindow.LexemeStartPosition
were mechanically replaced withthis.LexemeStartPosition
@@ -120,7 +120,7 @@ private void ScanRawStringLiteral(ref TokenInfo info, bool inDirective) | |||
}; | |||
} | |||
info.Text =TextWindow.GetText(intern: true); | |||
info.Text =this.GetInternedLexemeText(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
References toTextWindow.GetText(intern: true)
andTextWindow.GetText(intern: false)
were mechanically updated tothis.GetInternedLexemeText()
andthis.GetNonInternedLexemeText()
mechanically.
// The max index in charWindow that we will quick scan to. This is either the end of the window | ||
// or the position of the largest token we'd be willing to quick scan and cache. | ||
var maxIndexInWindow = Math.Min(charWindow.Length, startIndexInWindow + MaxCachedTokenSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
renamed these all for clarity.
return _characterWindow; | ||
} | ||
} | ||
public readonly ReadOnlySpan<char> CurrentWindowSpan => _characterWindow.AsSpan(_positionInText - _characterWindowStartPositionInText); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
exposing the characters outwards (For a few lexer fast-paths) is now much simpler and safer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
note that we do NOT expose CharacterWindow itself. No clients actually need it, and then it becomes much more confusing for them how to properly index into it. Instead, we just expose the characters available at the current position and beyodn that (up to whatever it in the array segment).
/// </summary> | ||
publicint Width | ||
publicreadonly ReadOnlySpan<char> GetTextOfValidSpan(TextSpan span) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
another option would be to make this a TryGetText helper. that way there isn't this tight pair of this and SpanIsWithinWindow
} | ||
protected void Start() | ||
{ | ||
TextWindow.Start(); | ||
LexemeStartPosition = this.TextWindow.Position; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
the text window is now blissfully unaware of what the lexer is doing with lexemes.
return true; | ||
// Otherwise, ensure that the character window contains this position so we can read characters directly | ||
// from there. | ||
this.ReadChunkAt(position); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
logic is now so much simpler. especially not needing to figure out how to ensure lexeme positioning is still accurate.
// chunk. | ||
return this.Text[this.Position - 1]; | ||
} | ||
=> PeekChar(-1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
no need for special logic. it's now trivial to read fromany position in the text. so we just take advantage of that here. 4095/4096 times this will be free. 1/4096 times it won't be. that's totally fine to not care about us going backward and reading in a new chunk from the soruce text.
@@ -4577,15 +4577,20 @@ disabled text 2 | |||
[Fact, WorkItem("https://github.com/dotnet/roslyn/issues/78593")] | |||
public void TestDotPrefixedNumberStartingAtStartOfSlidingTextWindow() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
even though the window is now handle very differently, i still kept this test, but augmented it. it's still worth validating that if we need to peek back one char, and taht reads in a different chunk of text from the source-text, that that works properly.
Uh oh!
There was an error while loading.Please reload this page.
/// <summary> | ||
/// Not readonly. This is a mutable struct that will be modified as we lex tokens. | ||
/// </summary> | ||
internal SlidingTextWindow TextWindow; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Documented later. But this makes it so that there really need to be practically no indirections between the lexer and the chars being read msot of hte time.
if (value == null) | ||
if (textWindow.TryGetTextIfWithinWindow(span, out var lexemeTextSpan)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
note the user of spans here now, which is really nice for having the text window pass out the span of characters of interest, which then is exactly what the TriviaMap operates on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Comment
@@ -11,24 +11,27 @@ namespace Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax | |||
// separate out text windowing implementation (keeps scanning & lexing functions from abusing details) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
i would view this PR with whitespace off.
/// grab in the next chunk of characters from the source text, as only one in every 65 lexemes | ||
/// will need to go back to the source-text to read the full lexeme. | ||
/// </summary> | ||
public const int DefaultWindowLength = 4096; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
That said, there's a virtue here of havin ga smaller number and exercising the transition points better in tests with code that goes over this length.
@dotnet/roslyn-compiler this is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Really really like the simplification here. Left a few comments, but nothing major.
/// grab in the next chunk of characters from the source text, as only one in every 65 lexemes | ||
/// will need to go back to the source-text to read the full lexeme. | ||
/// </summary> | ||
public const int DefaultWindowLength = 4096; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Would it be worth using a different value in DEBUG vs RELEASE? Better test coverage in DEBUG with lower values, better perf in RELEASE with higher values?
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
On it.
/// grab in the next chunk of characters from the source text, as only one in every 65 lexemes | ||
/// will need to go back to the source-text to read the full lexeme. | ||
/// </summary> | ||
public const int DefaultWindowLength = 4096; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'd possibly prefer that in a followup PR though. what think you?
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
@@ -185,7 +185,9 @@ public readonly bool SpanIsWithinWindow(TextSpan span) | |||
/// <paramref name="span"/> is entirely within bounds of the current character window (see <see | |||
/// cref="SpanIsWithinWindow(TextSpan)"/>). Otherwise, returns <see langword="false"/>. Used to allow | |||
/// fast path access to a sequence of characters in the common case where they are in the window, while | |||
/// falling back if they are not. | |||
/// falling back if they are not. Note: this does not mutate the window in any way, it just reads from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
My problem here was with the wording "falling back". There is no fallback in this method. It just returns false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
gotcha. will fixup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
i tweaked the wording. teh 'fall back' concept was about what hte caller would need to do in that case, not the callee. hopeffully it is clearer now.
@dotnet/roslyn-compiler for another pair of eyes. |
The general philosophy here is that the SlidingTextWindow is just a representation of characters, and doesn't really care anything about "current lexeme". Instead, that entire concept it contained entirely within the lexer itself.
This helps allow us to tweak these components with less risk of regression (like the recent regression around lexing out a
dot
and ensuring it was not a..
in a range).