SwiftSyntax
SwiftSyntax is a Swift library that lets you parse, analyze, generate, and transform Swift source code. It’s based on thelibSyntax library, and was spun out from the main Swift language repositoryin August 2017.
Together, the goal of these projects is to provide safe, correct, and intuitive facilities forstructured editing, which is describedthusly:
What is structured editing? It’s an editing strategy that is keenly aware of thestructure of source code, not necessarily itsrepresentation (i.e. characters or bytes). This can be achieved at different granularities: replacing an identifier, changing a call to global function to a method call, or indenting and formatting an entire source file based on declarative rules.
At the time of writing, SwiftSyntax is still in development and subject to API changes. But you can start using it today to work with Swift source code in a programmatic way.
It’s currently used by theSwift Migrator, and there are ongoing efforts to adopt the tool, both internally and externally.
How Does It Work?
To understand how SwiftSyntax works, let’s take a step back and look at the Swift compiler architecture:
The Swift compiler is primarily responsible for turning Swift code into executable machine code. The process is divided up into several discrete steps, starting with theparser, which generates an abstract syntax tree, (AST). From there, semantic analysis is performed on the syntax to produce a type-checked AST, which lowered intoSwift Intermediate Language; theSIL is transformed and optimized and itself lowered intoLLVM IR, which is ultimately compiled into machine code.
The most important takeaway for our discussion is that SwiftSyntax operates on the AST generated at the first step of the compilation process. As such, it can’t tell you any semantic or type information about code.
Contrast this with something likeSourceKit, which operates with a much more complete understanding of Swift code. This additional information can be helpful for implementing editor features like code-completion or navigating across files. But there are plenty of important use cases that can be satisfied on a purely syntactic level, such as code formatting and syntax highlighting.
Demystifying the AST
Abstract syntax trees can be difficult to understand in the abstract. So let’s generate one and see what it looks like.
Consider the following single-line Swift file, which declares a function namedone()
that returns the value1
:
funcone()->Int{return1}
Run theswiftc
command on this file passing the-frontend -emit-syntax
arguments:
$xcrun swiftc-frontend-emit-syntax ./One.swift
The result is a chunk of JSON representing the AST. Its structure becomes much clearer once you reformat the JSON:
{"kind":"SourceFile" ,"layout":[{"kind":"CodeBlockItemList" ,"layout":[{"kind":"CodeBlockItem" ,"layout":[{"kind":"FunctionDecl" ,"layout":[null,null,{"tokenKind" :{"kind":"kw_func"},"leadingTrivia" :[],"trailingTrivia" :[{"kind":"Space","value":1}],"presence":"Present"},{"tokenKind" :{"kind":"identifier","text":"one"},"leadingTrivia" :[],"trailingTrivia" :[],"presence":"Present"},...
The Pythonjson.tool
module offers a convenient way to format JSON. It comes standard in macOS releases going back as far as anyone can recall. For example, here’s how you could use it with the redirected compiler output:
$xcrun swiftc-frontend-emit-syntax ./One.swift | python-m json.tool
At the top-level, we have aSource
consisting ofCode
elements and their constituentCode
parts. This example has a singleCode
for the function declaration (Function
),which itself comprises subcomponents includinga function signature,parameter clause,and return clause.
The termtrivia is used to describe anything that isn’t syntactically meaningful, like whitespace. Each token can have one or more pieces of leading and trailing trivia. For example, the space after theInt
in the return clause (-> Int
) is represented by the following piece of trailing trivia.
{"kind":"Space","value":1}
Working Around File System Constraints
SwiftSyntax generates abstract syntax trees by delegating system calls toswiftc
. However, this requires code to be associated with a file in order to be processed, and it’s often useful to work with code as a string.
One way to work around this constraint is to write code to a temporary file and pass that to the compiler.
We’ve written about temporary files in the past, but nowadays, there’s a much nicer API for working with them that’s provided by theSwift Package Manager itself. In yourPackage.swift
file, add the following package dependency, and add the"Utility"
dependency to the appropriate target:
.package(url:"https://github.com/apple/swift-package-manager.git",from:"0.3.0"),
Now, you can import theBasic
module and use itsTemporary
API like so:
importBasicimportFoundationletcode:Stringlettempfile=tryTemporaryFile (deleteOnClose :true)defer{tempfile.fileHandle .closeFile ()}tempfile.fileHandle .write(code.data(using:.utf8)!)leturl=URL(fileURLWithPath :tempfile.path.asString )letsourceFile =trySyntaxTreeParser .parse(url)
What Can You Do With It?
Now that we have a reasonable idea of how SwiftSyntax works, let’s talk about some of the ways that you can use it!
Writing Swift Code: The Hard Way
The first andleast compelling use case for SwiftSyntax is to make writing Swift code an order of magnitude more difficult.
SwiftSyntax, by way of itsSyntax
APIs,allows you to generate entirely new Swift code from scratch.Unfortunately, doing this programmaticallyisn’t exactly a walk in the park.
For example, consider the following code:
importSwiftSyntax letstructKeyword =SyntaxFactory .makeStructKeyword (trailingTrivia :.spaces(1))letidentifier=SyntaxFactory .makeIdentifier ("Example",trailingTrivia :.spaces(1))letleftBrace =SyntaxFactory .makeLeftBraceToken ()letrightBrace =SyntaxFactory .makeRightBraceToken (leadingTrivia :.newlines(1))letmembers=MemberDeclBlockSyntax {builderinbuilder.useLeftBrace (leftBrace )builder.useRightBrace (rightBrace )}letstructureDeclaration =StructDeclSyntax {builderinbuilder.useStructKeyword (structKeyword )builder.useIdentifier (identifier)builder.useMembers (members)}print(structureDeclaration )
Whew. So what did all of that effort get us?
structExample{}
Oofa doofa.
This certainly isn’t going to replaceGYB for everyday code generation purposes. (In fact,libSyntax andSwiftSyntax both make extensive use ofgyb
to generate its interfaces.)
But this interface can be quite useful when precision matters. For instance, you might use SwiftSyntax to implement afuzzer for the Swift compiler, using it to randomly generate arbitrarily-complex-but-ostensibly-valid programs to stress test its internals.
Rewriting Swift Code
The example provided in the SwiftSyntax README shows how to write a program to take each integer literal in a source file and increment its value by one.
Looking at that, you can already extrapolate out to how this might be used to create a canonicalswift-format
tool. But for the moment, let’s consider a considerablyless productive — and more seasonally appropriate (🎃) — use of source rewriting:
importSwiftSyntax publicclassZalgoRewriter :SyntaxRewriter {publicoverridefuncvisit(_token:TokenSyntax )->Syntax{guardcaselet.stringLiteral (text)=token.tokenKind else{returntoken}returntoken.withKind (.stringLiteral (zalgo(text)))}}
What’s thatzalgo
function all about? You’re probably better off not knowing…
Anyway, running this rewriter on your source code transforms all string literals in the following manner:
// Before 👋😄print("Hello, world!")// After 🦑😵print("H͞͏̟̂ͩel̵ͬ͆͜ĺ͎̪̣͠ơ̡̼͓̋͝, w͎̽̇ͪ͢ǒ̩͔̲̕͝r̷̡̠͓̉͂l̘̳̆ͯ̊d!")
Spooky, right?
Highlighting Swift Code
Let’s conclude our look at SwiftSyntax with something that’s actually useful: a Swift syntax highlighter.
Asyntax highlighter, in this sense, describes any tool that takes source code and formats it in a way that’s more suitable for display in HTML.
NSHipster is built on top of Jekyll, and uses the Ruby libraryRouge to colorize the example code you see in every article. However, due to Swift’s relatively complex syntax and rapid evolution, the generated HTML isn’t always 100% correct.
Instead ofmessing with a pile of regular expressions, we could insteadbuild a syntax highlighter that leverages SwiftSyntax’s superior understanding of the language.
At its core, the implementation is rather straightforward: implement a subclass ofSyntax
and override thevisit(_:)
methodthat’s called for each token as a source file is traversed.By switching over each of the different kinds of tokens,you can map them to the HTML markup for theircorresponding highlighter tokens.
For example, numeric literals are represented with<span>
elements whose class name begins with the letterm
(mf
for floating-point,mi
for integer, etc.). Here’s the corresponding code in ourSyntax
subclass:
importSwiftSyntax classSwiftSyntaxHighlighter :SyntaxRewriter {varhtml:String=""overridefuncvisit(_token:TokenSyntax )->Syntax{switchtoken.tokenKind {…case.floatingLiteral (letstring):html+="<span class=\"mf\">\(string)</span>"case.integerLiteral (letstring):ifstring.hasPrefix ("0b"){html+="<span class=\"mb\">\(string)</span>"}elseifstring.hasPrefix ("0o"){html+="<span class=\"mo\">\(string)</span>"}elseifstring.hasPrefix ("0x"){html+="<span class=\"mh\">\(string)</span>"}else{html+="<span class=\"mi\">\(string)</span>"}…default:break}returntoken}}
AlthoughSyntax
has specializedvisit(_:)
methodsfor each of the different kinds of syntax elements,I found it easier to handle everything in a singleswitch
statement.(Printing unhandled tokens in thedefault
branchwas a really helpful way to find any cases that I wasn’t already handling).It’s not the most elegant of implementations,but it was a convenient place to startgiven my limited understanding of the library.
Anyway, after a few hours of development, I was able to generate reasonable colorized output for a wide range of Swift syntactic features:
The project comes with a library and a command line tool. Go ahead andtry it out and let me know what you think!