Commitefb68b8

dsyme

authored and

KevinRansom

committed

add notes on FsLexYacc (dotnet#6000)

* add notes on FsLexYacc* add notes on FsLexYacc* add notes on FsLexYacc* add notes on FsLexYacc* add notes on FsLexYacc

1 parentd4432b9 commitefb68b8Copy full SHA for efb68b8

File tree

5 files changed

+236

-93

lines changed

src
- buildtools
  - README-fslexyacc.md
- utils

5 files changed

+236

-93

lines changed

`‎src/buildtools/README-fslexyacc.md‎`

Lines changed: 74 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,74 @@`
	`1`	`+#Notes on FsLex and FsYacc`
	`2`	`+`
	`3`	+For better or worse the F# compiler contains three tokenizers (`*.fsl`) and three
	`4`	+grammars (`*.fsy`) implemented using FsLex and FsYacc respectively, including the all-important F# grammar itself.
	`5`	`+The canonical home for FsLex and FsYacc ishttp://github.com/fsprojects/FsLexYacc.`
	`6`	`+FsLex and FsYacc are themselves built using earlier versions of FsLex and FsYacc.`
	`7`	`+`
	`8`	`+**If you would like to improve, modify, extend, test or document these`
	`9`	`+tools, generally please do so in that repository. There are some exceptions, see below.**`
	`10`	`+`
	`11`	+The`src\buildtools\fslex` and`src\buildtools\fsyacc` directories are an_exact_ copy of`packages\FsLexYacc.XYZ\src\fslex` and`packages\FsLexYacc.XYZ\src\fsyacc`. We should really verify this as part of our build.
	`12`	`+This copy is done because we needed to have a build-from-source story.`
	`13`	`+In build-from-source, the only tool we can assume is an install of the .NET SDK.`
	`14`	`+That means we have to build up FsLex and FsYacc from scratch,_including_ their own generated fslexlex.fs, fslexpars.fs and so on.`
	`15`	`+We can't pick up the source from "packages" because in a build-from-source scenario we can't even fetch those`
	`16`	`+packages - we really have to build from just our source tree and .NET SDK.`
	`17`	`+`
	`18`	`+Please do_not_ modify the code in these directories except by copying over from an upgraded FsLexYacc pacakge.`
	`19`	+Without the testing and documentation in the`FsLexYacc` repo, this copied code is just a bunch of untested, undocumented and
	`20`	`+largely generated code checked into our source tree.`
	`21`	`+`
	`22`	`+##What if I want to modify/improve FsLex and FsYacc`
	`23`	`+`
	`24`	`+First, be clear on what you want to do:`
	`25`	`+`
	`26`	`+1. You might want to update the_code generators_ for the fslex or fsyacc tools.`
	`27`	`+`
	`28`	`+2. You might want to update the_runtime_ of the fslex or fsyacc tools.`
	`29`	`+`
	`30`	+For (1), to improve the code/table generators, make a PR to the`FsLexYacc` repository and go through the cycle of updating these files to match a package upgrade.
	`31`	`+`
	`32`	+For (2), normally for FsLexYacc-based tools the runtime is either a source inclusion of`Lexing.fs`, Lexing.fsi, Parsing.fs, Parsing.fsi or a reference to the`FsLexYacc.Runtime` package. The runtime contains LexBuffer and the lexing/parsing table interpreters.
	`33`	`+`
	`34`	+However long ago we decided to duplicate and ingest the_runtime_ files for FsLex and FsYacc into the F# compiler rather than taking them directly from the FsLexYacc project. This was mainly because we wanted to squeeze optimizations out of them based on profiling and simplify them a bit. The duplicated files are`prim-lexing.fs`,`prim-parsing.fs` and the corresponding`.fsi` files in`src/utils`. These files are sufficient to implement the contracts exepcted by the FsLex/FsYacc generated code, and require exactly the same table formats as generated by FsLex/FsYacc.
	`35`	`+`
	`36`	+This means you can improve some aspects of the_runtime_ for FsLex and FsYacc by making direct changes to`prim-lexing.fs` and`prim-parsing.fs`.
	`37`	`+`
	`38`	+For example, the_actual_`LexBuffer` type being used in the F# compiler (for all three lexers and grammars) is this one:https://github.com/Microsoft/visualfsharp/blob/master/src/utils/prim-lexing.fsi#L50. (That version of the Lex/Yacc runtime has added some things:`BufferLocalStore` for example, which we use for the`XmlDoc` accumulator as we strip those out. It's also dropped any mention of async lexing, and any mention of`byte`. The use
	`39`	+of generics for`LexBuffer<'Char>` is also superfluous because`'Char` is always`char` but is needed because the FsLex/FsYacc generated code expects this type to be generic.)
	`40`	`+`
	`41`	`+##What if I want to eridicate our use of FsLex and FsYacc?`
	`42`	`+`
	`43`	`+The use of FsLex and FsYacc in this repo is somewhat controversial since the C# compiler implementation uses hand-written lexers and parsers.`
	`44`	`+`
	`45`	`+In the balance the use of FsLex is fairly reasonable and unlikely to change, though moving to an alternative tokenization technique wouldn't be`
	`46`	+overly difficult given the declarative nature of`FsLex` tokenization.
	`47`	`+`
	`48`	`+The use of a table-driven LALR(1) parser is more controversial: there is a general feeling that it would be great to`
	`49`	`+somehow move on from FsYacc and do parsing some other way. However, it is not at all easy to do that and remain`
	`50`	`+fully compatible. For this reason it is unlikely we will remove the use of FsYacc any time soon. However incremental`
	`51`	`+modifications to extract more information from the grammer may yield good results.`
	`52`	`+`
	`53`	`+##Why aren't FsLex and FsYacc just ingested into this repo if we depend on them (and even have an exact copy of them for build-from-source)?`
	`54`	`+`
	`55`	`+FsLex and FsYacc are non-trivial tools that require documentation and testing. Also, for external users, they require packaging. Changes to their design should be`
	`56`	`+considered carefully. While we are open to adding features to these tools specifically for use by the F# compiler, the tools are open source and available`
	`57`	`+independently. For these reasons it is generally best that these tools live in their own repository.`
	`58`	`+`
	`59`	+The copy of the`fslex` and`fsyacc` source code in`buildtools` is an exact copy and is not tested or documented
	`60`	`+apart from what's been done before in FsLexYacc repo. Adjusting these copies is not allowed and would be wrong from an engineering persepctive,`
	`61`	`+because there's no place to put documentation or tests.`
	`62`	`+`
	`63`	`+Occasionally we discuss ingesting FsLex and FsYacc into this repository. This often comes up in the hope that by doing so`
	`64`	`+we can somehow eventually code-fold them away until we no longer require them at all, instead moving to hand-written parsers`
	`65`	`+and lexers. That's an admirable goal. However, moving the tools into this repo doesn't actually help with eliminating their`
	`66`	`+use, and may indeed make it harder. This is because these tools use table generation`
	`67`	`+based on very specific lexer/grammar specifications. The tables are unreadable and unmaintainable. You can't just`
	`68`	`+somehow "specialize" the tools to the F# grammar and then get rid of them as this doesn't give a useful, maintainable lexer or parser.`
	`69`	`+To our knowledge there is no way to convert an LALR(1) parser specification to readable, maintainable recursive descent parsing code.`
	`70`	`+`
	`71`	`+As a result, ingesting the tools into this repo (and modifying them here) would be counter-productive, as the tools would no longer be tested, documented or`
	`72`	`+maintained properly, and overall engineering quality would decrease. Further the bootstrap process for the repo then becomes very unwieldy.`
	`73`	`+`
	`74`	`+`

`‎src/utils/prim-lexing.fs‎`

Lines changed: 1 addition & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -2,6 +2,7 @@`
`2`	`2`
`3`	`3`	`#nowarn"47"// recursive initialization of LexBuffer`
`4`	`4`
	`5`	`+// NOTE: the code in this file is a drop-in replacement runtime for Lexing.fs from the FsLexYacc repository`
`5`	`6`
`6`	`7`	`namespaceInternal.Utilities.Text.Lexing`
`7`	`8`

`‎src/utils/prim-lexing.fsi‎`

Lines changed: 22 additions & 3 deletions

Original file line number	Diff line number	Diff line change
`@@ -2,7 +2,10 @@`
`2`	`2`
`3`	`3`	`// LexBuffers are for use with automatically generated lexical analyzers,`
`4`	`4`	`// in particular those produced by 'fslex'.`
`5`		`-`
	`5`	`+//`
	`6`	`+// NOTE: the code in this file is a drop-in replacement runtime for Lexing.fsi from the FsLexYacc repository`
	`7`	`+// and is referenced by generated code for the three FsLex generated lexers in the F# compiler.`
	`8`	`+// The underlying table format intepreted must precisely match the format generated by FsLex.`
`6`	`9`	`namespaceInternal.Utilities.Text.Lexing`
`7`	`10`
`8`	`11`	`openSystem.Collections.Generic`
`@@ -14,27 +17,35 @@ open Microsoft.FSharp.Control`
`14`	`17`	`typeinternalPosition=`
`15`	`18`	`/// The file index for the file associated with the input stream, use <c>fileOfFileIndex</c> in range.fs to decode`
`16`	`19`	`valFileIndex:int`
	`20`	`+`
`17`	`21`	`/// The line number in the input stream, assuming fresh positions have been updated`
`18`	`22`	`/// for the new line by modifying the EndPos property of the LexBuffer.`
`19`	`23`	`valLine:int`
	`24`	`+`
`20`	`25`	`/// The line number for the position in the input stream, assuming fresh positions have been updated`
`21`	`26`	`/// using for the new line.`
`22`	`27`	`valOriginalLine:int`
	`28`	`+`
`23`	`29`	`/// The character number in the input stream.`
`24`	`30`	`valAbsoluteOffset:int`
	`31`	`+`
`25`	`32`	`/// Return absolute offset of the start of the line marked by the position.`
`26`	`33`	`valStartOfLineAbsoluteOffset:int`
	`34`	`+`
`27`	`35`	`/// Return the column number marked by the position,`
`28`	`36`	`/// i.e. the difference between the <c>AbsoluteOffset</c> and the <c>StartOfLineAbsoluteOffset</c>`
`29`	`37`	`memberColumn:int`
`30`		`-// Given a position just beyond the end of a line, return a position at the start of the next line.`
	`38`	`+`
	`39`	`+/// Given a position just beyond the end of a line, return a position at the start of the next line.`
`31`	`40`	`memberNextLine:Position`
`32`	`41`
`33`	`42`	`/// Given a position at the start of a token of length n, return a position just beyond the end of the token.`
`34`	`43`	`memberEndOfToken:n:int->Position`
	`44`	`+`
`35`	`45`	`/// Gives a position shifted by specified number of characters.`
`36`	`46`	`memberShiftColumnBy:by:int->Position`
`37`		`-// Same line, column -1.`
	`47`	`+`
	`48`	`+/// Same line, column -1.`
`38`	`49`	`memberColumnMinusOne:Position`
`39`	`50`
`40`	`51`	`/// Apply a #line directive.`
`@@ -47,11 +58,15 @@ type internal Position =`
`47`	`58`
`48`	`59`	`[<Sealed>]`
`49`	`60`	`/// Input buffers consumed by lexers generated by<c>fslex.exe</c>.`
	`61`	`+/// The type must be generic to match the code generated by FsLex and FsYacc(if you would like to`
	`62`	`+/// fix this, please submit a PR to the FsLexYacc repository allowing for optional emit of a non-generic type reference).`
`50`	`63`	`type internal LexBuffer<'Char>=`
`51`	`64`	`/// The start position for the lexeme.`
`52`	`65`	`memberStartPos:Position with get,set`
	`66`	`+`
`53`	`67`	`/// The end position for the lexeme.`
`54`	`68`	`memberEndPos:Position with get,set`
	`69`	`+`
`55`	`70`	`/// The matched string.`
`56`	`71`	`memberLexeme:'Char[]`
`57`	`72`
`@@ -67,13 +82,17 @@ type internal LexBuffer<'Char> =`
`67`	`82`	`/// Create a lex buffer suitable for Unicode lexing that reads characters from the given array.`
`68`	`83`	`/// Important: does take ownership of the array.`
`69`	`84`	`static memberFromChars:char[]->LexBuffer<char>`
	`85`	`+`
`70`	`86`	`/// Create a lex buffer that reads character or byte inputs by using the given function.`
`71`	`87`	`static memberFromFunction:('Char[]* int* int-> int)->LexBuffer<'Char>`
`72`	`88`
`73`	`89`	`/// The type of tables for an unicode lexer generated by <c>fslex.exe</c>.`
`74`	`90`	`[<Sealed>]`
`75`	`91`	`type internal UnicodeTables=`
	`92`	`+`
	`93`	`+/// Create the tables from raw data`
`76`	`94`	`static memberCreate:uint16[][]* uint16[]-> UnicodeTables`
	`95`	`+`
`77`	`96`	`/// Interpret tables for a unicode lexer generated by<c>fslex.exe</c>.`
`78`	`97`	`member Interpret: initialState:int* LexBuffer<char>-> int`
`79`	`98`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commitefb68b8

File tree

5 files changed

5 files changed

`‎src/buildtools/README-fslexyacc.md‎`

`‎src/utils/prim-lexing.fs‎`

`‎src/utils/prim-lexing.fsi‎`

0 commit comments