
Introduction to Code Generation in Rust
This article is about generating Rust code from other Rust code,not forthe code generation step of the rustc compiler. Another term for source code generation is metaprogramming, but it will be referred to as code generation here. The reader is expected to have some Rust knowledge.
What problems can it solve?
I want to ship a web frontend embedded inside a Rust binary to end users, such as a desktop application. Projects likeTauri achieve embedding with code generation by writing Rust code that generates more Rust code. Why does Tauri choose to use code generation over less complicated solutions? Let’s take a look at what that solution might look like.
Imagine the output of our web frontend looks like:
dist├── assets│ ├── script-44b5bae5.js│ ├── style-48a8825f.css├── index.html
Let’s embed these in our Rust project by usinginclude_str!()
, which adds the content of the specified file into the binary. That would look something like this:
usestd::collections::HashMap;fnmain(){letmutassets=HashMap::new();assets.insert("/index.html",include_str!("../dist/index.html"));assets.insert("/assets/script-44b5bae5.js",include_str!("../dist/assets/script-44b5bae5.js"));assets.insert("/assets/style-48a8825f.css",include_str!("../dist/assets/style-48a8825f.css"));}
Straightforward enough, now we can grab those assets directly from the final binary! However, what if we don’t always know the assets’ filenames ahead of time? Let’s say we have worked more on our frontend project and now its output looks like:
dist├── assets│ │# script-44b5bae5.js previously│ ├── script-581f5c69.js│ ││ │# style-48a8825f.css previously│ ├── style-e49f12aa.css├── index.html
Ah… the filenames of our assets have changed due to our frontend bundler utilizingcache busting. The Rust code no longer compiles until we fix the filenames inside of it. It would be a terrible developer experience if we had to update our Rust code every time we changed the frontend - imagine if we had dozens of assets! Tauri uses code generation to avoid this by finding the assets at compile time and generating Rust code which calls the correct assets.
Tools
Let’s talk about a few tools for code generation and then use them to implement our own simple asset bundler.
The
quote
crate enables us to write Rust code that gets transformed into data which then generates syntactically correct Rust code. This crate is ubiquitous across the Rust ecosystem for writing code generation.The
walkdir
crate provides an easy way to recursively grab all items in a directory. This crate is highly applicable for our asset bundler use-case.The
phf
crate implements a HashMap implementation usingperfect hash functions. This is useful when all keys and values in the map are known before it’s built. This crate is highly applicable for our asset bundler use-case.
Rust code generation typically occurs inbuild scripts ormacros. We will be building our simple asset bundler using build scripts because we will be accessing the disk. Whileprocedural macros can also do that, it can be problematic in a few ways.
Building the Assets Bundler
The source code isavailable on GitHub if you want to see how everything is put together afterwards.
Create our library
Let’s start off with creating a new Rust library:
cargo new--lib asset-bundlercdasset-bundler
We want to create a way for applications that use this library to grab the assets, so let’s create that first. This will involve us creating a wrapper aroundphf::Map
and a method to let callers get the content.
cargo add phf--features macros
We don’t need too much functionality from ourAssets
struct, just a way to create it and a way to get at what’s inside of it. The following goes intosrc/lib.rs
:
pubusephf;// re-export phf so we can use it latertypeMap=phf::Map<&'staticstr,&'staticstr>;/// Container for compile-time embedded assets.pubstructAssets(Map);implFrom<Map>forAssets{fnfrom(value:Map)->Self{Self(value)}}implAssets{/// Get the contents of the specified asset path.pubfnget(&self,path:&str)->Option<&str>{self.0.get(path).copied()}}
Codegen
Now, we build the library that will be used in a build script to generate our code. Because we will be having multiple crates in the same repository, let’s quickly convert the project to acargo workspace. Let’s add the following to the top of ourCargo.toml
:
[workspace]members=["codegen"]
Now we are ready to continue creating our codegen library. Run these commands to create our project and grab our dependencies:
cargo new--lib codegen--name asset-bundler-codegencargo add quote walkdir--package asset-bundler-codegen
Time to think a bit of what functionality we need and boil it down into a few concrete steps.
We pass an assets path to our function, which we will call
base
.We check if
base
exists, or else we can’t do anything.Recursively gather all file paths inside
base
.Generate code to embed all the file paths.
One last thing to mention, we want to get assets by passing in a relative path. We wantassets.get("index.html")
, notassets.get("../dist/index.html")
. This means we will need to keep track of thatbase
directory passed into our function. Let’s write those requirements down as code inside ofcodegen/src/lib.rs
:
/// Generate Rust code to create an [`asset-bundler::Asset`] from the passed path.pubfncodegen(path:&Path)->std::io::Result<String>{// canonicalize also checks if the path exists// which is the only case that makes sense for usletbase=path.canonicalize()?;letpaths=gather_asset_paths(&base);Ok(generate_code(&paths,&base))}/// Recursively find all files in the passed directory.fngather_asset_paths(base:&Path)->Vec<PathBuf>{todo!()}/// Generate Rust code to create an [`asset-bundler::Asset`].fngenerate_code(paths:&[PathBuf],base:&Path)->String{todo!()}
Let’s take ongather_assets_paths
first, since it’s more specific to our project than codegen. We will usewalkdir
to recursively grab all the files from the passedbase
directory. This is a simple example project, so we will ignore errors for now by usingflatten()
which removes nested iterators. BecauseResult
also implement’sIntoIterator
, we are only left with successful values. Let’s implement it incodegen/src/lib.rs
:
/// Recursively find all files in the passed directory.fngather_asset_paths(base:&Path)->Vec<PathBuf>{letmutpaths=Vec::new();forentryinWalkDir::new(base).into_iter().flatten(){// we only care about files, ignore directoriesifentry.file_type().is_file(){paths.push(entry.into_path())}}paths}
Cool cool cool.
Now we have a list of all asset files that are supposed to be included in the binary. The second function will generate the actual Rust code, but let’s see what the code we are generating should look like. We need to make sure that:
We import the correct dependencies.
The
phf::Map
is created with all the values, we can usephf::phf_map!
to help.Our
Assets
struct from our first library is created.
The first point is pretty important, we need to make sure we are calling the correct library. We can prevent crate name collisions by using a leading::
on ouruse
statement. Additionally, we need to make sure we have our re-exportedphf
, otherwise the end application will fail to compile if it itself doesn’t depend onphf
.
Using the frontend example from above, this is howphf_map!
should look like:
use::asset_bundler::{Assets,phf::{self,phf_map}};letmap=phf_map!{"index.html"=>include_str!("../dist/index.html"),"assets/script-44b5bae5.js"=>include_str!("../dist/assets/script-44b5bae5.js"),"assets/style-48a8825f.css"=>include_str!("../dist/assets/style-48a8825f.css")};letassets=Assets::from(map);
Our first problem comes from us only having the paths used ininclude_str!()
, we don’t have the “key” paths. We also need to turn our paths into strings at some point, because that is how they are used in the generated code. Let’s first figure out how to transform our list of paths into a list of strings suitable for keys. We need to strip thebase
prefix we resolved earlier from all the paths, so let’s write that inside ofcodegen/src/lib.rs
:
/// Turn paths into relative paths suitable for keys.fnkeys(paths:&[PathBuf],base:&Path)->Vec<String>{letmutkeys=Vec::new();forpathinpaths{// ignore this failure case for this exampleifletOk(key)=path.strip_prefix(base){keys.push(key.to_string_lossy().into())}}keys}
Thevalues
of the map are easier. Their paths are already the ones [include_dir!()
] need, so we just need to turn them into strings. Let’s write this one with an Iterator, which we also could have done withkeys
:
letvalues=paths.iter().map(|p|p.to_string_lossy());
So now we have bothkeys
andvalues
in usable formats. Next comes the macro part, where we will actually be generating code from all the data.
Let’s talk about how we are about to use double brackets. This isnot something required when doing code generation, but in our case we want to use the resultingAssets
anywhere. By using a block expression we can use it anywhere an expression is valid, which is lots of places.
Second, we are about to use some very unfamiliar syntax for those of you who have not written macros before. While it may seem strange at first, the syntax here is widely used across the ecosystem. In particular, we are going to be using the repetition syntax ofquote
. This allows us to use our two collections ofkeys
andvalues
together.
Let’s do it:
quote!{{use::asset_bundler::{Assets,phf::{self,phf_map}};Assets::from(phf_map!{ #( #keys=>include_str!(#values)),*})}}
While the syntax is surely a departure from normal Rust code, hopefully you are able to recognize some familiar patterns we already went over. Here’s a side-by-side comparison to thephf_map!
example we did before:
letkeys=["key1","key2","key3"];letvalues=["value1","value2","value3"];quote!{phf_map!{ #( #keys=>include_str!(#values)),*}}// turns into thisphf_map!{"key1"=>include_str!("value1"),"key2"=>include_str!("value2"),"key3"=>include_str!("value3")}
With all that out of the way, let’s plug that into ourgenerate_code
function we created earlier to see how it interacts with the rest of the code. Inside ofcodegen/src/lib.rs
:
/// Generate Rust code to create an [`asset-bundler::Asset`].fngenerate_code(paths:&[PathBuf],base:&Path)->String{letkeys=keys(paths,base);letvalues=paths.iter().map(|p|p.to_string_lossy());// double brackets to make it a block expressionletoutput=quote!{{use::asset_bundler::{Assets,phf::{self,phf_map}};Assets::from(phf_map!{ #( #keys=>include_str!(#values)),*})}};output.to_string()}/// Turn paths into relative paths suitable for keysfnkeys(paths:&[PathBuf],base:&Path)->Vec<String>{letmutkeys=Vec::new();forpathinpaths{// ignore this failure case for this exampleifletOk(key)=path.strip_prefix(base){keys.push(key.to_string_lossy().into())}}keys}
Phew! That actually wraps up the codegen library. I’ll drop the fullcodegen/src/lib.rs
here, and then we can skedaddle to actually using what we just worked on:
usequote::quote;usestd::path::{Path,PathBuf};usewalkdir::WalkDir;/// Generate Rust code to create an [`asset-bundler::Asset`] from the passed path.pubfncodegen(path:&Path)->std::io::Result<String>{// canonicalize also checks if the path exists// which is the only case that makes sense for usletbase=path.canonicalize()?;letpaths=gather_asset_paths(&base);Ok(generate_code(&paths,&base))}/// Recursively find all files in the passed directory.fngather_asset_paths(base:&Path)->Vec<PathBuf>{letmutpaths=Vec::new();forentryinWalkDir::new(base).into_iter().flatten(){// we only care about files, ignore directoriesifentry.file_type().is_file(){paths.push(entry.into_path())}}paths}/// Generate Rust code to create an [`asset-bundler::Asset`].fngenerate_code(paths:&[PathBuf],base:&Path)->String{letkeys=keys(paths,base);letvalues=paths.iter().map(|p|p.to_string_lossy());// double brackets to make it a block expressionletoutput=quote!{{use::asset_bundler::{Assets,phf::{self,phf_map}};Assets::from(phf_map!{ #( #keys=>include_str!(#values)),*})}};output.to_string()}/// Turn paths into relative paths suitable for keys.fnkeys(paths:&[PathBuf],base:&Path)->Vec<String>{letmutkeys=Vec::new();forpathinpaths{// ignore this failure case for this exampleifletOk(key)=path.strip_prefix(base){keys.push(key.to_string_lossy().into())}}keys}
Using it
We just made a simple asset bundler in 50 lines of code, and it’s time to use it! We will start off with creating a new example project to consume the two libraries we just created.
First, add a new item to the rootCargo.toml
:
[workspace]members=["codegen","example"]
Then, we create the example binary and add our dependencies:
cargo new--bin examplecargo add asset-bundler--path.--package examplecargo add--build asset-bundler-codegen--path codegen--package exampletouchexample/build.rsmkdir-p example/assets/scripts
Let’s start off the Rust code with the build script since we just created our codegen library. We will want to call thecodegen
function we created earlier to get the generated code. Now we can write this generated Rust code to somewhere our other code can use it. This is going into ourexample/build.rs
:
usestd::path::Path;fnmain(){letassets=Path::new("assets");letcodegen=matchasset_bundler_codegen::codegen(assets){Ok(codegen)=>codegen,Err(err)=>panic!("failed to generate asset bundler codegen: {err}"),};letout=std::env::var("OUT_DIR").unwrap();letout=Path::new(&out).join("assets.rs");std::fs::write(out,codegen.as_bytes()).unwrap();}
We ended up writing the code to$OUT_DIR/assets.rs
because build scripts set$OUT_DIR
to a unique directory for each crate, and new versions of the same crate. The path we just wrote to will be important in just a second, but first let’s create some assets to actually use.
We want to create some assets that are somewhat representative of the examplewe used at the start. In this case, let’s imagine that these assets are for a webserver and the files are served to the browser. This article isn’t the place for implementing the server, but we will mimic theindex.html
’s script dependencies by using what asset they require as their contents. Run these commands to create them:
echo-n"scripts/loader-a1b2c3.js"> example/assets/index.htmlecho-n"scripts/dashboard-f0e9d8.js"> example/assets/scripts/loader-a1b2c3.jsecho-n"console.log('dashboard stuff')"> example/assets/scripts/dashboard-f0e9d8.js
It’s time to put it together and get a glimpse of how it works! We set up the examples so that there is only a single “always known” filenameindex.html
. Our goal is to get the content of that dashboard script using only aindex.html
literal. Here we will jump to the each next asset inexample/src/main.rs
:
fnmain(){// include the assets our build script createdletassets=include!(concat!(env!("OUT_DIR"),"/assets.rs"));letindex=assets.get("index.html").unwrap();letloader=assets.get(index).unwrap();letdashboard=assets.get(loader).unwrap();assert_eq!(dashboard,"console.log('dashboard stuff')");}
Don’t forget, you can seeall the code on GitHub.
That’s it!
A very bare-bones asset bundler in 94 lines of code, including the example. Treating code generation like any other Rust code is an important aspect to keeping it understandable and maintainable. In those 90 lines of code, there were only a handful of lines for doing actual code generation. Let’s break down what we did…
We created the
asset-bundler
crate that provides theAssets
type and re-exportedphf
to ensure that our codegen crate could use it.We created the
asset-bundler-codegen
crate to hold all the functionality codegen uses, along with providing a public functioncodegen
to utilize it.We created the example build script to call the
codegen
function on its own assets. The generated code was written to a file which we then included in ourexample/main.rs
.
While having a separate crate isn’t necessary for specifically build script code generation, it is very common. Not only does it help separate concerns and prevent unused dependencies, it also helps prevent circular dependencies on more complex projects. Having a separate crate isrequired for performing code generation withprocedural macros.
Code generation is a powerful tool to bring advanced functionality to your Rust programs. Our example from earlier,Tauri, uses it extensively to perform code injection, compression, and validation for its own asset bundling.
Demystify code generation by writing it as regular Rust code, empowering you to build powerful software.
Author: Chip Reed, Security Engineer atCrabNebula
Top comments(0)
For further actions, you may consider blocking this person and/orreporting abuse