In this post, I will demonstrate how to optimize the Rust CLI toolast-grep by 10x.
Rust itself usually runs fast enough, but it is not a silver bullet to all performance issues.
In my case, I didn’t pay enough attention to runtime details or opted for naive implementation for a quick prototype. And these inadvertent mistakes and deliberate slacking off became ast-grep’s bottleneck.
ast-grep is my project to help you search and rewrite code usingabstract syntax tree.
Conceptually, ast-grep takes a piece of pattern code (think of it like a regular expression but for AST), matches the pattern against your codebase, and gives a list of matched AST nodes back to you. See theplayground for a live demo.
I designed ast-grpe’s architecture with performance in mind. Here are a few performance-related highlights:
Okay, enough self-promotion! If it is designed to be fast, how comes this blog? Let’s dive into the performance bottleneck I found in my bad code.
Spoiler. It’s my bad to write slow Rust.
The first thing to optimize a program is to profile it. I am lazy this time and used theflamegraph tool.
Installing it is simple:
cargo install flamegraphThen run it against ast-grep! No other setup is needed, compared to other profiling tools!
This time I’m using an ast-grep port ofes-lint againstTypeScript’ssrc folder.
This is the profiling command I used.
sudo flamegraph - sg scan -c eslint/sgconfig.yml TypeScript/src - json > /dev/nullTheflamegraph looks like this.
Optimizing the program is a matter of finding the hotspots in theflamegraph and fix them.
For a more intuitive feeling about performance, I used the old commandtime to measure the wall time to run the command. The result is not good.
time sg scan -c eslint/sgconfig.yml TypeScript/src
17.63s user, 0.46s system, 167% cpu, 10.823 totalThe time beforeuser is the actual CPU time spent on my program. The time beforetotal represents the wall time. The ratio between them is CPU utilization. In this case, it is 167%. It means my program is not fully utilizing the CPU.
It only runs six rules against the codebase and it costs about 10 whole seconds!
In contrast, running one ast-grep pattern against the TypeScript source only costs 0.5 seconds and the CPU utilization is decent.
time sg run -p '$A && $A()' TypeScript/src - json > /dev/null
1.96s user, 0.11s system, 329% cpu, 0.628 totalThe first thing I noticed is that theregex::Regex type is cloned a lot. I do know it is expensive to compile a regex, but I did not expect cloning one will be the bottleneck.
Much to my limited understanding,dropping Regex is also expensive!
Fortunately, the fix is simple: I can use a reference to the regex instead of cloning it.
This optimization alone shaves about 50% of execution time.
time sg scan -c eslint/sgconfig.yml TypeScript/src - json > /dev/null
13.89s user, 0.74s system, 274% cpu 5.320 totalThe newflamegraph looks like this:
The second thing I noticed is that thematch_node function is called a lot. It is the function that matches a pattern against an AST node.
ast-grep can match an AST node by rules, and those rules can be composed together into more complex rules.
For example, the ruleany: [rule1, rule2] is a composite rule that consists of two sub-rules and the composite rule matches a node when either one of the sub-rules matches the node.
This can be expensive since multiple rules must be tried for every node to see if they actually make a match.
I have already foreseen it so every rule in ast-grep has an optimization calledpotential_kinds. AST node in tree-sitter has its own type encoded in an unsigned number calledkind.
If a rule can only match nodes with specific kinds, then we can avoid callingmatch_node for nodes if its kind is not in thepotential_kinds set.
I used aBitSet to encode the set of potential kinds. Naturally thepotential_kinds of composite rules can be constructed by merging thepotential_kinds of its sub-rules, according to their logic nature.
For example,any‘s potential_kinds is the union of its sub-rules’potential_kinds, andall’spotential_kinds is the intersection of its sub-rules potential_kinds.
Using this optimization, I can avoid callingmatch_node for nodes that can never match a rule. This optimization shaves another 40% of execution time!
sg scan -c eslint/sgconfig.yml TypeScript/src - json > /dev/null
11.57s user, 0.48s system, 330% cpu, 3.644 totalThe new flamegraph is:
Finally, the function callts_tree_cursor_child_iterator_next caught my eye. It meant that a lot of time was spent traversing the AST tree.
Well, I dumbly iterated through all six rules and matched the whole AST tree for each rule. This is a lot of duplicated work!
So I used a data structure to combine these rules, according to theirpotential_kinds. When I’m traversing the AST tree, I will first retrieve the rules withpotential_kinds containing the kind of the current node. Then I will only run these rules against the node. And nodes without anypotential_kinds hit will be naturally skipped during the traversal.
This is a huge optimization! The ending result is less than 1 second! And the CPU utilization is pretty good.
sg scan -c eslint/sgconfig.yml TypeScript/src - json > /dev/null
2.82s user, 0.12s system, 301% cpu, 0.975 totalThe final flamegraph looks like this. I’m too lazy to optimize more. I’m happy with the sub-second result for now.
Optimizing ast-grep is a fun journey. I learned a lot about Rust and performance tuning. I hope you enjoyed this post as well.
🌐 Frontend Vimmer, ⚒️OSS with @typescript @vuejs and @rustlang 💻 Previously worked at @BytedanceTalk. Hobby project:https://ast-grep.github.io/