Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitefb68b8

Browse files
dsymeKevinRansom
authored andcommitted
add notes on FsLexYacc (dotnet#6000)
* add notes on FsLexYacc* add notes on FsLexYacc* add notes on FsLexYacc* add notes on FsLexYacc* add notes on FsLexYacc
1 parentd4432b9 commitefb68b8

File tree

5 files changed

+236
-93
lines changed

5 files changed

+236
-93
lines changed

‎src/buildtools/README-fslexyacc.md‎

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
#Notes on FsLex and FsYacc
2+
3+
For better or worse the F# compiler contains three tokenizers (`*.fsl`) and three
4+
grammars (`*.fsy`) implemented using FsLex and FsYacc respectively, including the all-important F# grammar itself.
5+
The canonical home for FsLex and FsYacc ishttp://github.com/fsprojects/FsLexYacc.
6+
FsLex and FsYacc are themselves built using earlier versions of FsLex and FsYacc.
7+
8+
**If you would like to improve, modify, extend, test or document these
9+
tools, generally please do so in that repository. There are some exceptions, see below.**
10+
11+
The`src\buildtools\fslex` and`src\buildtools\fsyacc` directories are an_exact_ copy of`packages\FsLexYacc.XYZ\src\fslex` and`packages\FsLexYacc.XYZ\src\fsyacc`. We should really verify this as part of our build.
12+
This copy is done because we needed to have a build-from-source story.
13+
In build-from-source, the only tool we can assume is an install of the .NET SDK.
14+
That means we have to build up FsLex and FsYacc from scratch,_including_ their own generated fslexlex.fs, fslexpars.fs and so on.
15+
We can't pick up the source from "packages" because in a build-from-source scenario we can't even fetch those
16+
packages - we really have to build from just our source tree and .NET SDK.
17+
18+
Please do_not_ modify the code in these directories except by copying over from an upgraded FsLexYacc pacakge.
19+
Without the testing and documentation in the`FsLexYacc` repo, this copied code is just a bunch of untested, undocumented and
20+
largely generated code checked into our source tree.
21+
22+
##What if I want to modify/improve FsLex and FsYacc
23+
24+
First, be clear on what you want to do:
25+
26+
1. You might want to update the_code generators_ for the fslex or fsyacc tools.
27+
28+
2. You might want to update the_runtime_ of the fslex or fsyacc tools.
29+
30+
For (1), to improve the code/table generators, make a PR to the`FsLexYacc` repository and go through the cycle of updating these files to match a package upgrade.
31+
32+
For (2), normally for FsLexYacc-based tools the runtime is either a source inclusion of`Lexing.fs`, Lexing.fsi, Parsing.fs, Parsing.fsi or a reference to the`FsLexYacc.Runtime` package. The runtime contains LexBuffer and the lexing/parsing table interpreters.
33+
34+
However long ago we decided to duplicate and ingest the_runtime_ files for FsLex and FsYacc into the F# compiler rather than taking them directly from the FsLexYacc project. This was mainly because we wanted to squeeze optimizations out of them based on profiling and simplify them a bit. The duplicated files are`prim-lexing.fs`,`prim-parsing.fs` and the corresponding`.fsi` files in`src/utils`. These files are sufficient to implement the contracts exepcted by the FsLex/FsYacc generated code, and require exactly the same table formats as generated by FsLex/FsYacc.
35+
36+
This means you can improve some aspects of the_runtime_ for FsLex and FsYacc by making direct changes to`prim-lexing.fs` and`prim-parsing.fs`.
37+
38+
For example, the_actual_`LexBuffer` type being used in the F# compiler (for all three lexers and grammars) is this one:https://github.com/Microsoft/visualfsharp/blob/master/src/utils/prim-lexing.fsi#L50. (That version of the Lex/Yacc runtime has added some things:`BufferLocalStore` for example, which we use for the`XmlDoc` accumulator as we strip those out. It's also dropped any mention of async lexing, and any mention of`byte`. The use
39+
of generics for`LexBuffer<'Char>` is also superfluous because`'Char` is always`char` but is needed because the FsLex/FsYacc generated code expects this type to be generic.)
40+
41+
##What if I want to eridicate our use of FsLex and FsYacc?
42+
43+
The use of FsLex and FsYacc in this repo is somewhat controversial since the C# compiler implementation uses hand-written lexers and parsers.
44+
45+
In the balance the use of FsLex is fairly reasonable and unlikely to change, though moving to an alternative tokenization technique wouldn't be
46+
overly difficult given the declarative nature of`FsLex` tokenization.
47+
48+
The use of a table-driven LALR(1) parser is more controversial: there is a general feeling that it would be great to
49+
somehow move on from FsYacc and do parsing some other way. However, it is not at all easy to do that and remain
50+
fully compatible. For this reason it is unlikely we will remove the use of FsYacc any time soon. However incremental
51+
modifications to extract more information from the grammer may yield good results.
52+
53+
##Why aren't FsLex and FsYacc just ingested into this repo if we depend on them (and even have an exact copy of them for build-from-source)?
54+
55+
FsLex and FsYacc are non-trivial tools that require documentation and testing. Also, for external users, they require packaging. Changes to their design should be
56+
considered carefully. While we are open to adding features to these tools specifically for use by the F# compiler, the tools are open source and available
57+
independently. For these reasons it is generally best that these tools live in their own repository.
58+
59+
The copy of the`fslex` and`fsyacc` source code in`buildtools` is an exact copy and is not tested or documented
60+
apart from what's been done before in FsLexYacc repo. Adjusting these copies is not allowed and would be wrong from an engineering persepctive,
61+
because there's no place to put documentation or tests.
62+
63+
Occasionally we discuss ingesting FsLex and FsYacc into this repository. This often comes up in the hope that by doing so
64+
we can somehow eventually code-fold them away until we no longer require them at all, instead moving to hand-written parsers
65+
and lexers. That's an admirable goal. However, moving the tools into this repo doesn't actually help with eliminating their
66+
use, and may indeed make it harder. This is because these tools use table generation
67+
based on very specific lexer/grammar specifications. The tables are unreadable and unmaintainable. You can't just
68+
somehow "specialize" the tools to the F# grammar and then get rid of them as this doesn't give a useful, maintainable lexer or parser.
69+
To our knowledge there is no way to convert an LALR(1) parser specification to readable, maintainable recursive descent parsing code.
70+
71+
As a result, ingesting the tools into this repo (and modifying them here) would be counter-productive, as the tools would no longer be tested, documented or
72+
maintained properly, and overall engineering quality would decrease. Further the bootstrap process for the repo then becomes very unwieldy.
73+
74+

‎src/utils/prim-lexing.fs‎

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
#nowarn"47"// recursive initialization of LexBuffer
44

5+
// NOTE: the code in this file is a drop-in replacement runtime for Lexing.fs from the FsLexYacc repository
56

67
namespaceInternal.Utilities.Text.Lexing
78

‎src/utils/prim-lexing.fsi‎

Lines changed: 22 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,10 @@
22

33
// LexBuffers are for use with automatically generated lexical analyzers,
44
// in particular those produced by 'fslex'.
5-
5+
//
6+
// NOTE: the code in this file is a drop-in replacement runtime for Lexing.fsi from the FsLexYacc repository
7+
// and is referenced by generated code for the three FsLex generated lexers in the F# compiler.
8+
// The underlying table format intepreted must precisely match the format generated by FsLex.
69
namespaceInternal.Utilities.Text.Lexing
710

811
openSystem.Collections.Generic
@@ -14,27 +17,35 @@ open Microsoft.FSharp.Control
1417
typeinternalPosition=
1518
/// The file index for the file associated with the input stream, use <c>fileOfFileIndex</c> in range.fs to decode
1619
valFileIndex:int
20+
1721
/// The line number in the input stream, assuming fresh positions have been updated
1822
/// for the new line by modifying the EndPos property of the LexBuffer.
1923
valLine:int
24+
2025
/// The line number for the position in the input stream, assuming fresh positions have been updated
2126
/// using for the new line.
2227
valOriginalLine:int
28+
2329
/// The character number in the input stream.
2430
valAbsoluteOffset:int
31+
2532
/// Return absolute offset of the start of the line marked by the position.
2633
valStartOfLineAbsoluteOffset:int
34+
2735
/// Return the column number marked by the position,
2836
/// i.e. the difference between the <c>AbsoluteOffset</c> and the <c>StartOfLineAbsoluteOffset</c>
2937
memberColumn:int
30-
// Given a position just beyond the end of a line, return a position at the start of the next line.
38+
39+
/// Given a position just beyond the end of a line, return a position at the start of the next line.
3140
memberNextLine:Position
3241

3342
/// Given a position at the start of a token of length n, return a position just beyond the end of the token.
3443
memberEndOfToken:n:int->Position
44+
3545
/// Gives a position shifted by specified number of characters.
3646
memberShiftColumnBy:by:int->Position
37-
// Same line, column -1.
47+
48+
/// Same line, column -1.
3849
memberColumnMinusOne:Position
3950

4051
/// Apply a #line directive.
@@ -47,11 +58,15 @@ type internal Position =
4758
4859
[<Sealed>]
4960
/// Input buffers consumed by lexers generated by<c>fslex.exe</c>.
61+
/// The type must be generic to match the code generated by FsLex and FsYacc(if you would like to
62+
/// fix this, please submit a PR to the FsLexYacc repository allowing for optional emit of a non-generic type reference).
5063
type internal LexBuffer<'Char>=
5164
/// The start position for the lexeme.
5265
memberStartPos:Position with get,set
66+
5367
/// The end position for the lexeme.
5468
memberEndPos:Position with get,set
69+
5570
/// The matched string.
5671
memberLexeme:'Char[]
5772

@@ -67,13 +82,17 @@ type internal LexBuffer<'Char> =
6782
/// Create a lex buffer suitable for Unicode lexing that reads characters from the given array.
6883
/// Important: does take ownership of the array.
6984
static memberFromChars:char[]->LexBuffer<char>
85+
7086
/// Create a lex buffer that reads character or byte inputs by using the given function.
7187
static memberFromFunction:('Char[]* int* int-> int)->LexBuffer<'Char>
7288

7389
/// The type of tables for an unicode lexer generated by <c>fslex.exe</c>.
7490
[<Sealed>]
7591
type internal UnicodeTables=
92+
93+
/// Create the tables from raw data
7694
static memberCreate:uint16[][]* uint16[]-> UnicodeTables
95+
7796
/// Interpret tables for a unicode lexer generated by<c>fslex.exe</c>.
7897
member Interpret: initialState:int* LexBuffer<char>-> int
7998

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp