Movatterモバイル変換


[0]ホーム

URL:


Header menu logoF# Compiler Guide

Compiler Services: Using the F# tokenizer

This tutorial demonstrates how to call the F# language tokenizer. Given F#source code, the tokenizer generates a list of source code lines that containinformation about tokens on each line. For each token, you can get the typeof the token, exact location as well as color kind of the token (keyword,identifier, number, operator, etc.).

NOTE: The FSharp.Compiler.Service API is subject to change when later versions of the nuget package are published

Creating the tokenizer

To use the tokenizer, referenceFSharp.Compiler.Service.dll and open theFSharp.Compiler.Tokenization namespace:

#r"FSharp.Compiler.Service.dll"openFSharp.Compiler.Tokenization

Now you can create an instance ofFSharpSourceTokenizer. The class takes twoarguments - the first is the list of defined symbols and the second is thefile name of the source code. The defined symbols are required because thetokenizer handles#if directives. The file name is required only to specifylocations of the source code (and it does not have to exist):

letsourceTok=FSharpSourceTokenizer([],Some"C:\\test.fsx",Some"PREVIEW",None)

Using thesourceTok object, we can now (repeatedly) tokenize lines ofF# source code.

Tokenizing F# code

The tokenizer operates on individual lines rather than on the entire sourcefile. After getting a token, the tokenizer also returns new state (asint64 value).This can be used to tokenize F# code more efficiently. When source code changes,you do not need to re-tokenize the entire file - only the parts that have changed.

Tokenizing single line

To tokenize a single line, we create aFSharpLineTokenizer by callingCreateLineTokenizeron theFSharpSourceTokenizer object that we created earlier:

lettokenizer=sourceTok.CreateLineTokenizer("let answer=42")

Now, we can write a simple recursive function that callsScanToken on thetokenizeruntil it returnsNone (indicating the end of line). When the function succeeds, itreturns anFSharpTokenInfo object with all the interesting details:

/// Tokenize a single line of F# codeletrectokenizeLine(tokenizer:FSharpLineTokenizer)state=matchtokenizer.ScanToken(state)with|Sometok,state->// Print token nameprintf"%s "tok.TokenName// Tokenize the rest, in the new statetokenizeLinetokenizerstate|None,state->state

The function returns the new state, which is needed if you need to tokenize multiple linesand an earlier line ends with a multi-line comment. As an initial state, we can use0L:

tokenizeLinetokenizerFSharpTokenizerLexState.Initial

The result is a sequence of tokens with names LET, WHITESPACE, IDENT, EQUALS and INT32.There is a number of interesting properties onFSharpTokenInfo including:

Note that the tokenizer is stateful - if you want to tokenize single line multiple times,you need to callCreateLineTokenizer again.

Tokenizing sample code

To run the tokenizer on a longer sample code or an entire file, you need to read thesample input as a collection ofstring values:

letlines="""  // Hello world  let hello() =     printfn "Hello world!" """.Split('\r','\n')

To tokenize multi-line input, we again need a recursive function that keeps the currentstate. The following function takes the lines as a list of strings (together with line numberand the current state). We create a new tokenizer for each line and calltokenizeLineusing the state from theend of the previous line:

/// Print token names for multiple lines of codeletrectokenizeLinesstatecountlines=matchlineswith|line::lines->// Create tokenizer & tokenize single lineprintfn"\nLine%d"countlettokenizer=sourceTok.CreateLineTokenizer(line)letstate=tokenizeLinetokenizerstate// Tokenize the rest using new statetokenizeLinesstate(count+1)lines|[]->()

The function simply callstokenizeLine (defined earlier) to print the names of allthe tokens on each line. We can call it on the previous input with0L as the initialstate and1 as the number of the first line:

lines|>List.ofSeq|>tokenizeLinesFSharpTokenizerLexState.Initial1

Ignoring some unimportant details (like whitespace at the beginning of each line andthe first line which is just whitespace), the code generates the following output:

Line 1  LINE_COMMENT LINE_COMMENT (...) LINE_COMMENT Line 2  LET WHITESPACE IDENT LPAREN RPAREN WHITESPACE EQUALS Line 3  IDENT WHITESPACE STRING_TEXT (...) STRING_TEXT STRING

It is worth noting that the tokenizer yields multipleLINE_COMMENT tokens and multipleSTRING_TEXT tokens for each single comment or string (roughly, one for each word), soif you want to get the entire text of a comment/string, you need to concatenate thetokens.

Multiple items
namespace FSharp

--------------------
namespace Microsoft.FSharp
namespace FSharp.Compiler
namespace FSharp.Compiler.Tokenization
val sourceTok: FSharpSourceTokenizer
Multiple items
type FSharpSourceTokenizer = new: conditionalDefines: string list * fileName: string option * langVersion: string option * strictIndentation: bool option -> FSharpSourceTokenizer member CreateBufferTokenizer: bufferFiller: (char array * int * int -> int) -> FSharpLineTokenizer member CreateLineTokenizer: lineText: string -> FSharpLineTokenizer
<summary> Tokenizer for a source file. Holds some expensive-to-compute resources at the scope of the file.</summary>

--------------------
new: conditionalDefines: string list * fileName: string option * langVersion: string option * strictIndentation: bool option -> FSharpSourceTokenizer
union case Option.Some: Value: 'T -> Option<'T>
union case Option.None: Option<'T>
val tokenizer: FSharpLineTokenizer
member FSharpSourceTokenizer.CreateLineTokenizer: lineText: string -> FSharpLineTokenizer
val tokenizeLine: tokenizer: FSharpLineTokenizer -> state: FSharpTokenizerLexState -> FSharpTokenizerLexState
 Tokenize a single line of F# code
type FSharpLineTokenizer = member ScanToken: lexState: FSharpTokenizerLexState -> FSharpTokenInfo option * FSharpTokenizerLexState static member ColorStateOfLexState: FSharpTokenizerLexState -> FSharpTokenizerColorState static member LexStateOfColorState: FSharpTokenizerColorState -> FSharpTokenizerLexState
<summary> Object to tokenize a line of F# source code, starting with the given lexState. The lexState should be FSharpTokenizerLexState.Initial for the first line of text. Returns an array of ranges of the text and two enumerations categorizing the tokens and characters covered by that range, i.e. FSharpTokenColorKind and FSharpTokenCharKind. The enumerations are somewhat adhoc but useful enough to give good colorization options to the user in an IDE. A new lexState is also returned. An IDE-plugin should in general cache the lexState values for each line of the edited code.</summary>
val state: FSharpTokenizerLexState
member FSharpLineTokenizer.ScanToken: lexState: FSharpTokenizerLexState -> FSharpTokenInfo option * FSharpTokenizerLexState
val tok: FSharpTokenInfo
val printf: format: Printf.TextWriterFormat<'T> -> 'T
FSharpTokenInfo.TokenName: string
<summary> Provides additional information about the token</summary>
[<Struct>]type FSharpTokenizerLexState = { PosBits: int64 OtherBits: int64 } member Equals: FSharpTokenizerLexState -> bool static member Initial: FSharpTokenizerLexState with get
<summary> Represents encoded information for the end-of-line continuation of lexing</summary>
property FSharpTokenizerLexState.Initial: FSharpTokenizerLexState with get
val lines: string array
val tokenizeLines: state: FSharpTokenizerLexState -> count: int -> lines: string list -> unit
 Print token names for multiple lines of code
val count: int
val lines: string list
val line: string
val printfn: format: Printf.TextWriterFormat<'T> -> 'T
Multiple items
module Listfrom Microsoft.FSharp.Collections

--------------------
type List<'T> = | op_Nil | op_ColonColon of Head: 'T * Tail: 'T list interface IReadOnlyList<'T> interface IReadOnlyCollection<'T> interface IEnumerable interface IEnumerable<'T> member GetReverseIndex: rank: int * offset: int -> int member GetSlice: startIndex: int option * endIndex: int option -> 'T list static member Cons: head: 'T * tail: 'T list -> 'T list member Head: 'T with get member IsEmpty: bool with get member Item: index: int -> 'T with get ...
val ofSeq: source: 'T seq -> 'T list

On this page

Type something to start searching.


[8]ページ先頭

©2009-2025 Movatter.jp