|
| 1 | +#Notes on FsLex and FsYacc |
| 2 | + |
| 3 | +For better or worse the F# compiler contains three tokenizers (`*.fsl`) and three |
| 4 | +grammars (`*.fsy`) implemented using FsLex and FsYacc respectively, including the all-important F# grammar itself. |
| 5 | +The canonical home for FsLex and FsYacc ishttp://github.com/fsprojects/FsLexYacc. |
| 6 | +FsLex and FsYacc are themselves built using earlier versions of FsLex and FsYacc. |
| 7 | + |
| 8 | +**If you would like to improve, modify, extend, test or document these |
| 9 | +tools, generally please do so in that repository. There are some exceptions, see below.** |
| 10 | + |
| 11 | +The`src\buildtools\fslex` and`src\buildtools\fsyacc` directories are an_exact_ copy of`packages\FsLexYacc.XYZ\src\fslex` and`packages\FsLexYacc.XYZ\src\fsyacc`. We should really verify this as part of our build. |
| 12 | +This copy is done because we needed to have a build-from-source story. |
| 13 | +In build-from-source, the only tool we can assume is an install of the .NET SDK. |
| 14 | +That means we have to build up FsLex and FsYacc from scratch,_including_ their own generated fslexlex.fs, fslexpars.fs and so on. |
| 15 | +We can't pick up the source from "packages" because in a build-from-source scenario we can't even fetch those |
| 16 | +packages - we really have to build from just our source tree and .NET SDK. |
| 17 | + |
| 18 | +Please do_not_ modify the code in these directories except by copying over from an upgraded FsLexYacc pacakge. |
| 19 | +Without the testing and documentation in the`FsLexYacc` repo, this copied code is just a bunch of untested, undocumented and |
| 20 | +largely generated code checked into our source tree. |
| 21 | + |
| 22 | +##What if I want to modify/improve FsLex and FsYacc |
| 23 | + |
| 24 | +First, be clear on what you want to do: |
| 25 | + |
| 26 | +1. You might want to update the_code generators_ for the fslex or fsyacc tools. |
| 27 | + |
| 28 | +2. You might want to update the_runtime_ of the fslex or fsyacc tools. |
| 29 | + |
| 30 | +For (1), to improve the code/table generators, make a PR to the`FsLexYacc` repository and go through the cycle of updating these files to match a package upgrade. |
| 31 | + |
| 32 | +For (2), normally for FsLexYacc-based tools the runtime is either a source inclusion of`Lexing.fs`, Lexing.fsi, Parsing.fs, Parsing.fsi or a reference to the`FsLexYacc.Runtime` package. The runtime contains LexBuffer and the lexing/parsing table interpreters. |
| 33 | + |
| 34 | +However long ago we decided to duplicate and ingest the_runtime_ files for FsLex and FsYacc into the F# compiler rather than taking them directly from the FsLexYacc project. This was mainly because we wanted to squeeze optimizations out of them based on profiling and simplify them a bit. The duplicated files are`prim-lexing.fs`,`prim-parsing.fs` and the corresponding`.fsi` files in`src/utils`. These files are sufficient to implement the contracts exepcted by the FsLex/FsYacc generated code, and require exactly the same table formats as generated by FsLex/FsYacc. |
| 35 | + |
| 36 | +This means you can improve some aspects of the_runtime_ for FsLex and FsYacc by making direct changes to`prim-lexing.fs` and`prim-parsing.fs`. |
| 37 | + |
| 38 | +For example, the_actual_`LexBuffer` type being used in the F# compiler (for all three lexers and grammars) is this one:https://github.com/Microsoft/visualfsharp/blob/master/src/utils/prim-lexing.fsi#L50. (That version of the Lex/Yacc runtime has added some things:`BufferLocalStore` for example, which we use for the`XmlDoc` accumulator as we strip those out. It's also dropped any mention of async lexing, and any mention of`byte`. The use |
| 39 | +of generics for`LexBuffer<'Char>` is also superfluous because`'Char` is always`char` but is needed because the FsLex/FsYacc generated code expects this type to be generic.) |
| 40 | + |
| 41 | +##What if I want to eridicate our use of FsLex and FsYacc? |
| 42 | + |
| 43 | +The use of FsLex and FsYacc in this repo is somewhat controversial since the C# compiler implementation uses hand-written lexers and parsers. |
| 44 | + |
| 45 | +In the balance the use of FsLex is fairly reasonable and unlikely to change, though moving to an alternative tokenization technique wouldn't be |
| 46 | +overly difficult given the declarative nature of`FsLex` tokenization. |
| 47 | + |
| 48 | +The use of a table-driven LALR(1) parser is more controversial: there is a general feeling that it would be great to |
| 49 | +somehow move on from FsYacc and do parsing some other way. However, it is not at all easy to do that and remain |
| 50 | +fully compatible. For this reason it is unlikely we will remove the use of FsYacc any time soon. However incremental |
| 51 | +modifications to extract more information from the grammer may yield good results. |
| 52 | + |
| 53 | +##Why aren't FsLex and FsYacc just ingested into this repo if we depend on them (and even have an exact copy of them for build-from-source)? |
| 54 | + |
| 55 | +FsLex and FsYacc are non-trivial tools that require documentation and testing. Also, for external users, they require packaging. Changes to their design should be |
| 56 | +considered carefully. While we are open to adding features to these tools specifically for use by the F# compiler, the tools are open source and available |
| 57 | +independently. For these reasons it is generally best that these tools live in their own repository. |
| 58 | + |
| 59 | +The copy of the`fslex` and`fsyacc` source code in`buildtools` is an exact copy and is not tested or documented |
| 60 | +apart from what's been done before in FsLexYacc repo. Adjusting these copies is not allowed and would be wrong from an engineering persepctive, |
| 61 | +because there's no place to put documentation or tests. |
| 62 | + |
| 63 | +Occasionally we discuss ingesting FsLex and FsYacc into this repository. This often comes up in the hope that by doing so |
| 64 | +we can somehow eventually code-fold them away until we no longer require them at all, instead moving to hand-written parsers |
| 65 | +and lexers. That's an admirable goal. However, moving the tools into this repo doesn't actually help with eliminating their |
| 66 | +use, and may indeed make it harder. This is because these tools use table generation |
| 67 | +based on very specific lexer/grammar specifications. The tables are unreadable and unmaintainable. You can't just |
| 68 | +somehow "specialize" the tools to the F# grammar and then get rid of them as this doesn't give a useful, maintainable lexer or parser. |
| 69 | +To our knowledge there is no way to convert an LALR(1) parser specification to readable, maintainable recursive descent parsing code. |
| 70 | + |
| 71 | +As a result, ingesting the tools into this repo (and modifying them here) would be counter-productive, as the tools would no longer be tested, documented or |
| 72 | +maintained properly, and overall engineering quality would decrease. Further the bootstrap process for the repo then becomes very unwieldy. |
| 73 | + |
| 74 | + |