Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Cleanup split between lexer and sliding-text-window#79205

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
CyrusNajmabadi wants to merge88 commits intodotnet:main
base:main
Choose a base branch
Loading
fromCyrusNajmabadi:lexerField

Conversation

CyrusNajmabadi
Copy link
Member

The general philosophy here is that the SlidingTextWindow is just a representation of characters, and doesn't really care anything about "current lexeme". Instead, that entire concept it contained entirely within the lexer itself.

This helps allow us to tweak these components with less risk of regression (like the recent regression around lexing out adot and ensuring it was not a.. in a range).

@@ -15,6 +13,7 @@ internal class AbstractLexer : IDisposable
{
internal readonly SlidingTextWindow TextWindow;
private List<SyntaxDiagnosticInfo>? _errors;
protected int LexemeStartPosition;
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

moved to lexer-layer from SlidingTextWindow

=> TextWindow.GetText(LexemeStartPosition, intern: false);

protected string GetInternedLexemeText()
=> TextWindow.GetText(LexemeStartPosition, intern: true);
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

these .GetText helpers on the TextWindow were used all over the place. MOved to two simple helpers on Lexer for getting either an interned or non-interned chunk of text for the current lexeme.

=> TextWindow.GetText(LexemeStartPosition, intern: true);

protected int CurrentLexemeWidth
=> this.TextWindow.Position - LexemeStartPosition;
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

this was previously exposed as the (imo) confusingly named SlidingTextWindow.Width.

I tried to ensure that helpers related to lexemes used that term in their name to keep it clear what 'width/count/etc.' things are referring to.

internal SyntaxTrivia LookupWhitespaceTrivia(
SlidingTextWindow textWindow,
int lexemeStartPosition,
int hashCode)
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

easier to just read as the new function. There was only ever one caller ofr LookupTrivia, and it always used the same generic arg.

@@ -87,7 +87,7 @@ private void ScanRawStringLiteral(ref TokenInfo info, bool inDirective)
// trusting the contents.
if (this.HasErrors)
{
var afterStartDelimiter =TextWindow.LexemeStartPosition + startingQuoteCount;
var afterStartDelimiter =this.LexemeStartPosition + startingQuoteCount;
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Reference totextWindow.LexemeStartPosition were mechanically replaced withthis.LexemeStartPosition

@@ -120,7 +120,7 @@ private void ScanRawStringLiteral(ref TokenInfo info, bool inDirective)
};
}

info.Text =TextWindow.GetText(intern: true);
info.Text =this.GetInternedLexemeText();
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

References toTextWindow.GetText(intern: true) andTextWindow.GetText(intern: false) were mechanically updated tothis.GetInternedLexemeText() andthis.GetNonInternedLexemeText() mechanically.


// The max index in charWindow that we will quick scan to. This is either the end of the window
// or the position of the largest token we'd be willing to quick scan and cache.
var maxIndexInWindow = Math.Min(charWindow.Length, startIndexInWindow + MaxCachedTokenSize);
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

renamed these all for clarity.

return _characterWindow;
}
}
public readonly ReadOnlySpan<char> CurrentWindowSpan => _characterWindow.AsSpan(_positionInText - _characterWindowStartPositionInText);
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

exposing the characters outwards (For a few lexer fast-paths) is now much simpler and safer.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

note that we do NOT expose CharacterWindow itself. No clients actually need it, and then it becomes much more confusing for them how to properly index into it. Instead, we just expose the characters available at the current position and beyodn that (up to whatever it in the array segment).

/// </summary>
publicint Width
publicreadonly ReadOnlySpan<char> GetTextOfValidSpan(TextSpan span)
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

another option would be to make this a TryGetText helper. that way there isn't this tight pair of this and SpanIsWithinWindow

@CyrusNajmabadiCyrusNajmabadi marked this pull request as ready for reviewJuly 3, 2025 22:47
@CyrusNajmabadiCyrusNajmabadi requested a review froma team as acode ownerJuly 3, 2025 22:47
}

protected void Start()
{
TextWindow.Start();
LexemeStartPosition = this.TextWindow.Position;
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

the text window is now blissfully unaware of what the lexer is doing with lexemes.

return true;
// Otherwise, ensure that the character window contains this position so we can read characters directly
// from there.
this.ReadChunkAt(position);
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

logic is now so much simpler. especially not needing to figure out how to ensure lexeme positioning is still accurate.

// chunk.
return this.Text[this.Position - 1];
}
=> PeekChar(-1);
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

no need for special logic. it's now trivial to read fromany position in the text. so we just take advantage of that here. 4095/4096 times this will be free. 1/4096 times it won't be. that's totally fine to not care about us going backward and reading in a new chunk from the soruce text.

@@ -4577,15 +4577,20 @@ disabled text 2
[Fact, WorkItem("https://github.com/dotnet/roslyn/issues/78593")]
public void TestDotPrefixedNumberStartingAtStartOfSlidingTextWindow()
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

even though the window is now handle very differently, i still kept this test, but augmented it. it's still worth validating that if we need to peek back one char, and taht reads in a different chunk of text from the source-text, that that works properly.

/// <summary>
/// Not readonly. This is a mutable struct that will be modified as we lex tokens.
/// </summary>
internal SlidingTextWindow TextWindow;
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Documented later. But this makes it so that there really need to be practically no indirections between the lexer and the chars being read msot of hte time.


if (value == null)
if (textWindow.TryGetTextIfWithinWindow(span, out var lexemeTextSpan))
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

note the user of spans here now, which is really nice for having the text window pass out the span of characters of interest, which then is exactly what the TriviaMap operates on.

Copy link
MemberAuthor

@CyrusNajmabadiCyrusNajmabadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Comment

@@ -11,24 +11,27 @@ namespace Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax
// separate out text windowing implementation (keeps scanning & lexing functions from abusing details)
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

i would view this PR with whitespace off.

/// grab in the next chunk of characters from the source text, as only one in every 65 lexemes
/// will need to go back to the source-text to read the full lexeme.
/// </summary>
public const int DefaultWindowLength = 4096;
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

That said, there's a virtue here of havin ga smaller number and exercising the transition points better in tests with code that goes over this length.

@CyrusNajmabadi
Copy link
MemberAuthor

@dotnet/roslyn-compiler this is ready for review.

Copy link
Member

@333fred333fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Really really like the simplification here. Left a few comments, but nothing major.

/// grab in the next chunk of characters from the source text, as only one in every 65 lexemes
/// will need to go back to the source-text to read the full lexeme.
/// </summary>
public const int DefaultWindowLength = 4096;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Would it be worth using a different value in DEBUG vs RELEASE? Better test coverage in DEBUG with lower values, better perf in RELEASE with higher values?

Copy link
MemberAuthor

@CyrusNajmabadiCyrusNajmabadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

On it.

/// grab in the next chunk of characters from the source text, as only one in every 65 lexemes
/// will need to go back to the source-text to read the full lexeme.
/// </summary>
public const int DefaultWindowLength = 4096;
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'd possibly prefer that in a followup PR though. what think you?

@@ -185,7 +185,9 @@ public readonly bool SpanIsWithinWindow(TextSpan span)
/// <paramref name="span"/> is entirely within bounds of the current character window (see <see
/// cref="SpanIsWithinWindow(TextSpan)"/>). Otherwise, returns <see langword="false"/>. Used to allow
/// fast path access to a sequence of characters in the common case where they are in the window, while
/// falling back if they are not.
/// falling back if they are not. Note: this does not mutate the window in any way, it just reads from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

My problem here was with the wording "falling back". There is no fallback in this method. It just returns false.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

gotcha. will fixup.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

i tweaked the wording. teh 'fall back' concept was about what hte caller would need to do in that case, not the callee. hopeffully it is clearer now.

@CyrusNajmabadi
Copy link
MemberAuthor

@dotnet/roslyn-compiler for another pair of eyes.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@333fred333fred333fred approved these changes

Assignees
No one assigned
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

2 participants
@CyrusNajmabadi@333fred

[8]ページ先頭

©2009-2025 Movatter.jp