Compiler Services: Processing SyntaxTree
This tutorial demonstrates how to get the SyntaxTree (AST)for F# code and how to walk over the tree. This can be used for creating toolssuch as code formatter, basic refactoring or code navigation tools. The untypedsyntax tree contains information about the code structure, but does not containtypes and there are some ambiguities that are resolved only later by the typechecker. You can also combine the SyntaxTree information with the API availablefromeditor services.
NOTE: The FSharp.Compiler.Service API is subject to change when later versions of the nuget package are published
Getting the SyntaxTree
To access the untyped AST, you need to create an instance ofFSharpChecker
.This type represents a context for type checking and parsing and corresponds eitherto a stand-alone F# script file (e.g. opened in Visual Studio) or to a loaded projectfile with multiple files. Once you have an instance ofFSharpChecker
, you canuse it to perform "untyped parse" which is the first step of type-checking. Thesecond phase is "typed parse" and is used byeditor services.
To use the interactive checker, referenceFSharp.Compiler.Service.dll
and open theSourceCodeServices
namespace:
#r"FSharp.Compiler.Service.dll"openSystemopenFSharp.Compiler.CodeAnalysisopenFSharp.Compiler.Text
Performing untyped parse
The untyped parse operation is very fast (compared to type checking, which cantake notable amount of time) and so we can perform it synchronously. First, weneed to createFSharpChecker
- the constructor takes an argument thatcan be used to notify the checker about file changes (which we ignore).
// Create an interactive checker instanceletchecker=FSharpChecker.Create()
To get the AST, we define a function that takes file name and the source code(the file is only used for location information and does not have to exist).We first need to get "interactive checker options" which represents the context.For simple tasks, you can useGetProjectOptionsFromScriptRoot
which infersthe context for a script file. Then we use theParseFile
method andreturn theParseTree
property:
/// Get untyped tree for a specified inputletgetUntypedTree(file,input)=// Get compiler options for the 'project' implied by a single script fileletprojOptions,diagnostics=checker.GetProjectOptionsFromScript(file,input,assumeDotNetFramework=false)|>Async.RunSynchronouslyletparsingOptions,_errors=checker.GetParsingOptionsFromProjectOptions(projOptions)// Run the first phase (untyped parsing) of the compilerletparseFileResults=checker.ParseFile(file,input,parsingOptions)|>Async.RunSynchronouslyparseFileResults.ParseTree
Walking over the AST
The abstract syntax tree is defined as a number of discriminated unions that representdifferent syntactical elements (such as expressions, patterns, declarations etc.). The bestway to understand the AST is to look at the definitions inSyntaxTree.fsi
in the source code.
The relevant parts are in the following namespace:
openFSharp.Compiler.Syntax
When processing the AST, you will typically write a number of mutually recursive functionsthat pattern match on the different syntactical elements. There is a number of elementsthat need to be supported - the top-level element is module or namespace declaration,containing declarations inside a module (let bindings, types etc.). A let declaration insidea module then contains expressions, which can contain patterns.
Walking over patterns and expressions
We start by looking at functions that walk over expressions and patterns - as we walk,we print information about the visited elements. For patterns, the input is of typeSynPat
and has a number of cases includingWild
(for_
pattern),Named
(for<pat> as name
) andLongIdent
(for aFoo.Bar
name). Note that the parsed patternis occasionally more complex than what is in the source code (in particular,Named
isused more often):
/// Walk over a pattern - this is for example used in/// let <pat> = <expr> or in the 'match' expressionletrecvisitPattern=function|SynPat.Wild_->printfn" .. underscore pattern"|SynPat.Named(ident=SynIdent(ident=name))->printfn" .. named as '%s'"name.idText|SynPat.LongIdent(longDotId=SynLongIdent(id=ident))->letnames=String.concat"."[foriinident->i.idText]printfn" .. identifier:%s"names|pat->printfn" .. other pattern:%A"pat
The function is recursive (for nested patterns such as(foo, _) as bar
), but it does notcall any of the functions defined later (because patterns cannot contain other syntacticalelements).
The next function iterates over expressions - this is where most of the work would be andthere are around 20 cases to cover (typeSynExpr.
and you'll get completion with otheroptions). In the following, we only show how to handleif .. then ..
andlet .. = ...
:
/// Walk over an expression - if expression contains two or three/// sub-expressions (two if the 'else' branch is missing), let expression/// contains pattern and two sub-expressionsletrecvisitExpressione=matchewith|SynExpr.IfThenElse(ifExpr=cond;thenExpr=trueBranch;elseExpr=falseBranchOpt)->// Visit all sub-expressionsprintfn"Conditional:"visitExpressioncondvisitExpressiontrueBranchfalseBranchOpt|>Option.itervisitExpression|SynExpr.LetOrUse(_,_,bindings,body,_,_)->// Visit bindings (there may be multiple// for 'let .. = .. and .. = .. in ...'printfn"LetOrUse with the following bindings:"forbindinginbindingsdolet(SynBinding(headPat=headPat;expr=init))=bindingvisitPatternheadPatvisitExpressioninit// Visit the body expressionprintfn"And the following body:"visitExpressionbody|expr->printfn" - not supported expression:%A"expr
ThevisitExpression
function will be called from a function that visits all top-leveldeclarations inside a module. In this tutorial, we ignore types and members, but that wouldbe another source of calls tovisitExpression
.
Walking over declarations
As mentioned earlier, the AST of a file contains a number of module or namespace declarations(top-level node) that contain declarations inside a module (let bindings or types) or insidea namespace (just types). The following function walks over declarations - we ignore types,nested modules and all other elements and look only at top-levellet
bindings (values andfunctions):
/// Walk over a list of declarations in a module. This is anything/// that you can write as a top-level inside module (let bindings,/// nested modules, type declarations etc.)letvisitDeclarationsdecls=fordeclarationindeclsdomatchdeclarationwith|SynModuleDecl.Let(isRec,bindings,range)->// Let binding as a declaration is similar to let binding// as an expression (in visitExpression), but has no bodyforbindinginbindingsdolet(SynBinding(headPat=pat;expr=body))=bindingvisitPatternpatvisitExpressionbody|_->printfn" - not supported declaration:%A"declaration
ThevisitDeclarations
function will be called from a function that walks over asequence of module or namespace declarations. This corresponds, for example, to a filewith multiplenamespace Foo
declarations:
/// Walk over all module or namespace declarations/// (basically 'module Foo =' or 'namespace Foo.Bar')/// Note that there is one implicitly, even if the file/// does not explicitly define it..letvisitModulesAndNamespacesmodulesOrNss=formoduleOrNsinmodulesOrNssdolet(SynModuleOrNamespace(longId=lid;decls=decls))=moduleOrNsprintfn"Namespace or module:%A"lidvisitDeclarationsdecls
Now that we have functions that walk over the elements of the AST (starting from declaration,down to expressions and patterns), we can get AST of a sample input and run the above function.
Putting things together
As already discussed, thegetUntypedTree
function usesFSharpChecker
to run the firstphase (parsing) on the AST and get back the tree. The function requires F# source code togetherwith location of the file. The location does not have to exist (it is used only for locationinformation) and it can be in both Unix and Windows formats:
// Sample input for the compiler serviceletinput=""" let foo() = let msg = "Hello world" if true then printfn "%s" msg """// File name in Unix formatletfile="/home/user/Test.fsx"// Get the AST of sample F# codelettree=getUntypedTree(file,SourceText.ofStringinput)
When you run the code in F# interactive, you can entertree;;
in the interactive console andsee a pretty printed representation of the data structure - the tree contains a lot of information,so this is not particularly readable, but it gives you a good idea about how the tree looks.
The returnedtree
value is again a discriminated union that can be two different cases - one caseisParsedInput.SigFile
which represents F# signature file (*.fsi
) and the other one isParsedInput.ImplFile
representing regular source code (*.fsx
or*.fs
). The implementationfile contains a sequence of modules or namespaces that we can pass to the function implementedin the previous step:
// Extract implementation file detailsmatchtreewith|ParsedInput.ImplFile(implFile)->// Extract declarations and walk over themlet(ParsedImplFileInput(contents=modules))=implFilevisitModulesAndNamespacesmodules|_->failwith"F# Interface file (*.fsi) not supported."
Summary
In this tutorial, we looked at the basics of working with the untyped abstract syntax tree. This is acomprehensive topic, so it is not possible to explain everything in a single article. TheFantomas project is a good example of a tool based on the untypedAST that can help you understand more. In practice, it is also useful to combine the information herewith some information you can obtain from theeditor services discussed in the nexttutorial.
namespace FSharp
--------------------
namespace Microsoft.FSharp
<summary> Used to parse and check F# source code.</summary>
Get untyped tree for a specified input
type Async = static member AsBeginEnd: computation: ('Arg -> Async<'T>) -> ('Arg * AsyncCallback * objnull -> IAsyncResult) * (IAsyncResult -> 'T) * (IAsyncResult -> unit) static member AwaitEvent: event: IEvent<'Del,'T> * ?cancelAction: (unit -> unit) -> Async<'T> (requires delegate and 'Del :> Delegate) static member AwaitIAsyncResult: iar: IAsyncResult * ?millisecondsTimeout: int -> Async<bool> static member AwaitTask: task: Task<'T> -> Async<'T> + 1 overload static member AwaitWaitHandle: waitHandle: WaitHandle * ?millisecondsTimeout: int -> Async<bool> static member CancelDefaultToken: unit -> unit static member Catch: computation: Async<'T> -> Async<Choice<'T,exn>> static member Choice: computations: Async<'T option> seq -> Async<'T option> static member FromBeginEnd: beginAction: (AsyncCallback * objnull -> IAsyncResult) * endAction: (IAsyncResult -> 'T) * ?cancelAction: (unit -> unit) -> Async<'T> + 3 overloads static member FromContinuations: callback: (('T -> unit) * (exn -> unit) * (OperationCanceledException -> unit) -> unit) -> Async<'T> ...
--------------------
type Async<'T>
member FSharpChecker.ParseFile: fileName: string * sourceText: ISourceText * options: FSharpParsingOptions * ?cache: bool * ?userOpName: string -> Async<FSharpParseFileResults>
<summary> The syntax tree resulting from the parse</summary>
Walk over a pattern - this is for example used in
let <pat> = <expr> or in the 'match' expression
module SynPatfrom FSharp.Compiler.Syntax
--------------------
type SynPat = | Const of constant: SynConst * range: range | Wild of range: range | Named of ident: SynIdent * isThisVal: bool * accessibility: SynAccess option * range: range | Typed of pat: SynPat * targetType: SynType * range: range | Attrib of pat: SynPat * attributes: SynAttributes * range: range | Or of lhsPat: SynPat * rhsPat: SynPat * range: range * trivia: SynPatOrTrivia | ListCons of lhsPat: SynPat * rhsPat: SynPat * range: range * trivia: SynPatListConsTrivia | Ands of pats: SynPat list * range: range | As of lhsPat: SynPat * rhsPat: SynPat * range: range | LongIdent of longDotId: SynLongIdent * extraId: Ident option * typarDecls: SynValTyparDecls option * argPats: SynArgPats * accessibility: SynAccess option * range: range ... member Range: range with get
<summary> Represents a syntax tree for an F# pattern</summary>
<summary> A wildcard '_' in a pattern</summary>
<summary> A name pattern 'ident'</summary>
union case SynIdent.SynIdent: ident: Ident * trivia: FSharp.Compiler.SyntaxTrivia.IdentTrivia option -> SynIdent
--------------------
type SynIdent = | SynIdent of ident: Ident * trivia: IdentTrivia option member Range: range with get
<summary> Represents an identifier with potentially additional trivia information.</summary>
<summary> A long identifier pattern possibly with argument patterns</summary>
union case SynLongIdent.SynLongIdent: id: LongIdent * dotRanges: range list * trivia: FSharp.Compiler.SyntaxTrivia.IdentTrivia option list -> SynLongIdent
--------------------
type SynLongIdent = | SynLongIdent of id: LongIdent * dotRanges: range list * trivia: IdentTrivia option list member Dots: range list with get member IdentsWithTrivia: SynIdent list with get member LongIdent: LongIdent with get member Range: range with get member RangeWithoutAnyExtraDot: range with get member ThereIsAnExtraDotAtTheEnd: bool with get member Trivia: IdentTrivia list with get
<summary> Represents a long identifier with possible '.' at end. Typically dotRanges.Length = lid.Length-1, but they may be same if (incomplete) code ends in a dot, e.g. "Foo.Bar." The dots mostly matter for parsing, and are typically ignored by the typechecker, but if dotRanges.Length = lid.Length, then the parser must have reported an error, so the typechecker is allowed more freedom about typechecking these expressions. LongIdent can be empty list - it is used to denote that name of some AST element is absent (i.e. empty type name in inherit)</summary>
type String = interface IEnumerable<char> interface IEnumerable interface ICloneable interface IComparable interface IComparable<string> interface IConvertible interface IEquatable<string> interface IParsable<string> interface ISpanParsable<string> new: value: nativeptr<char> -> unit + 8 overloads ...
<summary>Represents text as a sequence of UTF-16 code units.</summary>
--------------------
String(value: nativeptr<char>) : String
String(value: char array) : String
String(value: ReadOnlySpan<char>) : String
String(value: nativeptr<sbyte>) : String
String(c: char, count: int) : String
String(value: nativeptr<char>, startIndex: int, length: int) : String
String(value: char array, startIndex: int, length: int) : String
String(value: nativeptr<sbyte>, startIndex: int, length: int) : String
String(value: nativeptr<sbyte>, startIndex: int, length: int, enc: Text.Encoding) : String
Walk over an expression - if expression contains two or three
sub-expressions (two if the 'else' branch is missing), let expression
contains pattern and two sub-expressions
module SynExprfrom FSharp.Compiler.Syntax
--------------------
type SynExpr = | Paren of expr: SynExpr * leftParenRange: range * rightParenRange: range option * range: range | Quote of operator: SynExpr * isRaw: bool * quotedExpr: SynExpr * isFromQueryExpression: bool * range: range | Const of constant: SynConst * range: range | Typed of expr: SynExpr * targetType: SynType * range: range | Tuple of isStruct: bool * exprs: SynExpr list * commaRanges: range list * range: range | AnonRecd of isStruct: bool * copyInfo: (SynExpr * BlockSeparator) option * recordFields: (SynLongIdent * range option * SynExpr) list * range: range * trivia: SynExprAnonRecdTrivia | ArrayOrList of isArray: bool * exprs: SynExpr list * range: range | Record of baseInfo: (SynType * SynExpr * range * BlockSeparator option * range) option * copyInfo: (SynExpr * BlockSeparator) option * recordFields: SynExprRecordField list * range: range | New of isProtected: bool * targetType: SynType * expr: SynExpr * range: range | ObjExpr of objType: SynType * argOptions: (SynExpr * Ident option) option * withKeyword: range option * bindings: SynBinding list * members: SynMemberDefns * extraImpls: SynInterfaceImpl list * newExprRange: range * range: range ... member IsArbExprAndThusAlreadyReportedError: bool with get member Range: range with get member RangeOfFirstPortion: range with get member RangeWithoutAnyExtraDot: range with get
<summary> Represents a syntax tree for F# expressions</summary>
<summary> F# syntax: if expr then expr F# syntax: if expr then expr else expr</summary>
<summary> F# syntax: let pat = expr in expr F# syntax: let f pat1 .. patN = expr in expr F# syntax: let rec f pat1 .. patN = expr in expr F# syntax: use pat = expr in expr</summary>
union case SynBinding.SynBinding: accessibility: SynAccess option * kind: SynBindingKind * isInline: bool * isMutable: bool * attributes: SynAttributes * xmlDoc: FSharp.Compiler.Xml.PreXmlDoc * valData: SynValData * headPat: SynPat * returnInfo: SynBindingReturnInfo option * expr: SynExpr * range: range * debugPoint: DebugPointAtBinding * trivia: FSharp.Compiler.SyntaxTrivia.SynBindingTrivia -> SynBinding
--------------------
type SynBinding = | SynBinding of accessibility: SynAccess option * kind: SynBindingKind * isInline: bool * isMutable: bool * attributes: SynAttributes * xmlDoc: PreXmlDoc * valData: SynValData * headPat: SynPat * returnInfo: SynBindingReturnInfo option * expr: SynExpr * range: range * debugPoint: DebugPointAtBinding * trivia: SynBindingTrivia member RangeOfBindingWithRhs: range with get member RangeOfBindingWithoutRhs: range with get member RangeOfHeadPattern: range with get
<summary> Represents a binding for a 'let' or 'member' declaration</summary>
Walk over a list of declarations in a module. This is anything
that you can write as a top-level inside module (let bindings,
nested modules, type declarations etc.)
<summary> Represents a definition within a module</summary>
<summary> A 'let' definition within a module</summary>
val range: range
--------------------
type range = Range
<summary> Represents a range within a file</summary>
Walk over all module or namespace declarations
(basically 'module Foo =' or 'namespace Foo.Bar')
Note that there is one implicitly, even if the file
does not explicitly define it..
union case SynModuleOrNamespace.SynModuleOrNamespace: longId: LongIdent * isRecursive: bool * kind: SynModuleOrNamespaceKind * decls: SynModuleDecl list * xmlDoc: FSharp.Compiler.Xml.PreXmlDoc * attribs: SynAttributes * accessibility: SynAccess option * range: range * trivia: FSharp.Compiler.SyntaxTrivia.SynModuleOrNamespaceTrivia -> SynModuleOrNamespace
--------------------
type SynModuleOrNamespace = | SynModuleOrNamespace of longId: LongIdent * isRecursive: bool * kind: SynModuleOrNamespaceKind * decls: SynModuleDecl list * xmlDoc: PreXmlDoc * attribs: SynAttributes * accessibility: SynAccess option * range: range * trivia: SynModuleOrNamespaceTrivia member Range: range with get
<summary> Represents the definition of a module or namespace</summary>
Get untyped tree for a specified input
<summary> Functions related to ISourceText objects</summary>
<summary> Creates an ISourceText object from the given string</summary>
module ParsedInputfrom FSharp.Compiler.Syntax
<summary> Holds operations for working with the untyped abstract syntax tree (<see cref="T:FSharp.Compiler.Syntax.ParsedInput" />). </summary>
--------------------
type ParsedInput = | ImplFile of ParsedImplFileInput | SigFile of ParsedSigFileInput member FileName: string with get member Identifiers: Set<string> with get member QualifiedName: QualifiedNameOfFile with get member Range: range with get
<summary> Represents the syntax tree for a parsed implementation or signature file</summary>
<summary> A parsed implementation file</summary>
union case ParsedImplFileInput.ParsedImplFileInput: fileName: string * isScript: bool * qualifiedNameOfFile: QualifiedNameOfFile * hashDirectives: ParsedHashDirective list * contents: SynModuleOrNamespace list * flags: bool * bool * trivia: FSharp.Compiler.SyntaxTrivia.ParsedInputTrivia * identifiers: Set<string> -> ParsedImplFileInput
--------------------
type ParsedImplFileInput = | ParsedImplFileInput of fileName: string * isScript: bool * qualifiedNameOfFile: QualifiedNameOfFile * hashDirectives: ParsedHashDirective list * contents: SynModuleOrNamespace list * flags: bool * bool * trivia: ParsedInputTrivia * identifiers: Set<string> member Contents: SynModuleOrNamespace list with get member FileName: string with get member HashDirectives: ParsedHashDirective list with get member IsExe: bool with get member IsLastCompiland: bool with get member IsScript: bool with get member QualifiedName: QualifiedNameOfFile with get member Trivia: ParsedInputTrivia with get
<summary> Represents the full syntax tree, file name and other parsing information for an implementation file</summary>