This is not a replacement for the LLVM documentation, but a collection of tips for working on LLVM for Julia.
Julia dynamically links against LLVM by default. Build withUSE_LLVM_SHLIB=0
to link statically.
The code for lowering Julia AST to LLVM IR or interpreting it directly is in directorysrc/
.
File | Description |
---|---|
aotcompile.cpp | Compiler C-interface entry and object file emission |
builtins.c | Builtin functions |
ccall.cpp | Loweringccall |
cgutils.cpp | Lowering utilities, notably for array and tuple accesses |
codegen.cpp | Top-level of code generation, pass list, lowering builtins |
debuginfo.cpp | Tracks debug information for JIT code |
disasm.cpp | Handles native object file and JIT code diassembly |
gf.c | Generic functions |
intrinsics.cpp | Lowering intrinsics |
jitlayers.cpp | JIT-specific code, ORC compilation layers/utilities |
llvm-alloc-helpers.cpp | Julia-specific escape analysis |
llvm-alloc-opt.cpp | Custom LLVM pass to demote heap allocations to the stack |
llvm-cpufeatures.cpp | Custom LLVM pass to lower CPU-based functions (e.g. haveFMA) |
llvm-demote-float16.cpp | Custom LLVM pass to lower 16b float ops to 32b float ops |
llvm-final-gc-lowering.cpp | Custom LLVM pass to lower GC calls to their final form |
llvm-gc-invariant-verifier.cpp | Custom LLVM pass to verify Julia GC invariants |
llvm-julia-licm.cpp | Custom LLVM pass to hoist/sink Julia-specific intrinsics |
llvm-late-gc-lowering.cpp | Custom LLVM pass to root GC-tracked values |
llvm-lower-handlers.cpp | Custom LLVM pass to lower try-catch blocks |
llvm-muladd.cpp | Custom LLVM pass for fast-match FMA |
llvm-multiversioning.cpp | Custom LLVM pass to generate sysimg code on multiple architectures |
llvm-propagate-addrspaces.cpp | Custom LLVM pass to canonicalize addrspaces |
llvm-ptls.cpp | Custom LLVM pass to lower TLS operations |
llvm-remove-addrspaces.cpp | Custom LLVM pass to remove Julia addrspaces |
llvm-remove-ni.cpp | Custom LLVM pass to remove Julia non-integral addrspaces |
llvm-simdloop.cpp | Custom LLVM pass for@simd |
pipeline.cpp | New pass manager pipeline, pass pipeline parsing |
sys.c | I/O and operating system utility functions |
Some of the.cpp
files form a group that compile to a single object.
The difference between an intrinsic and a builtin is that a builtin is a first class function that can be used like any other Julia function. An intrinsic can operate only on unboxed data, and therefore its arguments must be statically typed.
Julia currently uses LLVM'sType Based Alias Analysis. To find the comments that document the inclusion relationships, look forstatic MDNode*
insrc/codegen.cpp
.
The-O
option enables LLVM'sBasic Alias Analysis.
The default version of LLVM is specified indeps/llvm.version
. You can override it by creating a file calledMake.user
in the top-level directory and adding a line to it such as:
LLVM_VER = 13.0.0
Besides the LLVM release numerals, you can also useDEPS_GIT = llvm
in combination withUSE_BINARYBUILDER_LLVM = 0
to build against the latest development version of LLVM.
You can also specify to build a debug version of LLVM, by setting eitherLLVM_DEBUG = 1
orLLVM_DEBUG = Release
in yourMake.user
file. The former will be a fully unoptimized build of LLVM and the latter will produce an optimized build of LLVM. Depending on your needs the latter will suffice and it quite a bit faster. If you useLLVM_DEBUG = Release
you will also want to setLLVM_ASSERTIONS = 1
to enable diagnostics for different passes. OnlyLLVM_DEBUG = 1
implies that option by default.
You can pass options to LLVM via the environment variableJULIA_LLVM_ARGS
. Here are example settings usingbash
syntax:
export JULIA_LLVM_ARGS=-print-after-all
dumps IR after each pass.export JULIA_LLVM_ARGS=-debug-only=loop-vectorize
dumps LLVMDEBUG(...)
diagnostics for loop vectorizer. If you get warnings about "Unknown command line argument", rebuild LLVM withLLVM_ASSERTIONS = 1
.export JULIA_LLVM_ARGS=-help
shows a list of available options.export JULIA_LLVM_ARGS=-help-hidden
shows even more.export JULIA_LLVM_ARGS="-fatal-warnings -print-options"
is an example how to use multiple options.JULIA_LLVM_ARGS
parameters-print-after=PASS
: prints the IR after any execution ofPASS
, useful for checking changes done by a pass.-print-before=PASS
: prints the IR before any execution ofPASS
, useful for checking the input to a pass.-print-changed
: prints the IR whenever a pass changes the IR, useful for narrowing down which passes are causing problems.-print-(before|after)=MARKER-PASS
: the Julia pipeline ships with a number of marker passes in the pipeline, which can be used to identify where problems or optimizations are occurring. A marker pass is defined as a pass which appears once in the pipeline and performs no transformations on the IR, and is only useful for targeting print-before/print-after. Currently, the following marker passes exist in the pipeline:-time-passes
: prints the time spent in each pass, useful for identifying which passes are taking a long time.-print-module-scope
: used in conjunction with-print-(before|after)
, gets the entire module rather than the IR unit received by the pass-debug
: prints out a lot of debugging information throughout LLVM-debug-only=NAME
, prints out debugging statements from files withDEBUG_TYPE
defined toNAME
, useful for getting additional context about a problemOn occasion, it can be useful to debug LLVM's transformations in isolation from the rest of the Julia system, e.g. because reproducing the issue insidejulia
would take too long, or because one wants to take advantage of LLVM's tooling (e.g. bugpoint).
To start with, you can install the developer tools to work with LLVM via:
make -C deps install-llvm-tools
To get unoptimized IR for the entire system image, pass the--output-unopt-bc unopt.bc
option to the system image build process, which will output the unoptimized IR to anunopt.bc
file. This file can then be passed to LLVM tools as usual.libjulia
can function as an LLVM pass plugin and can be loaded into LLVM tools, to make julia-specific passes available in this environment. In addition, it exposes the-julia
meta-pass, which runs the entire Julia pass-pipeline over the IR. As an example, to generate a system image with the old pass manager, one could do:
llc -o sys.o opt.bccc -shared -o sys.so sys.o
To generate a system image with the new pass manager, one could do:
opt -load-pass-plugin=libjulia-codegen.so --passes='julia' -o opt.bc unopt.bcllc -o sys.o opt.bccc -shared -o sys.so sys.o
This system image can then be loaded byjulia
as usual.
It is also possible to dump an LLVM IR module for just one Julia function, using:
fun, T = +, Tuple{Int,Int} # Substitute your function of interest hereoptimize = falseopen("plus.ll", "w") do file println(file, InteractiveUtils._dump_function(fun, T, false, false, false, true, :att, optimize, :default, false))end
These files can be processed the same way as the unoptimized sysimg IR shown above.
To run the llvm tests locally, you need to first install the tools, build julia, then you can run the tests:
make -C deps install-llvm-toolsmake -j julia-src-releasemake -C test/llvmpasses
If you want to run the individual test files directly, via the commands at the top of each test file, the first step here will have installed the tools into./usr/tools/opt
. Then you'll want to manually replace%s
with the name of the test file.
Improving LLVM code generation usually involves either changing Julia lowering to be more friendly to LLVM's passes, or improving a pass.
If you are planning to improve a pass, be sure to read theLLVM developer policy. The best strategy is to create a code example in a form where you can use LLVM'sopt
tool to study it and the pass of interest in isolation.
JULIA_LLVM_ARGS=-print-after-all
to dump the IR.The last step is labor intensive. Suggestions on a better way would be appreciated.
Julia has a generic calling convention for unoptimized code, which looks somewhat as follows:
jl_value_t *any_unoptimized_call(jl_value_t *, jl_value_t **, int);
where the first argument is the boxed function object, the second argument is an on-stack array of arguments and the third is the number of arguments. Now, we could perform a straightforward lowering and emit an alloca for the argument array. However, this would betray the SSA nature of the uses at the call site, making optimizations (including GC root placement), significantly harder. Instead, we emit it as follows:
call %jl_value_t *@julia.call(jl_value_t *(*)(...) @any_unoptimized_call, %jl_value_t *%arg1, %jl_value_t *%arg2)
This allows us to retain the SSA-ness of the uses throughout the optimizer. GC root placement will later lower this call to the original C ABI.
GC root placement is done by an LLVM pass late in the pass pipeline. Doing GC root placement this late enables LLVM to make more aggressive optimizations around code that requires GC roots, as well as allowing us to reduce the number of required GC roots and GC root store operations (since LLVM doesn't understand our GC, it wouldn't otherwise know what it is and is not allowed to do with values stored to the GC frame, so it'll conservatively do very little). As an example, consider an error path
if some_condition() #= Use some variables maybe =# error("An error occurred")end
During constant folding, LLVM may discover that the condition is always false, and can remove the basic block. However, if GC root lowering is done early, the GC root slots used in the deleted block, as well as any values kept alive in those slots only because they were used in the error path, would be kept alive by LLVM. By doing GC root lowering late, we give LLVM the license to do any of its usual optimizations (constant folding, dead code elimination, etc.), without having to worry (too much) about which values may or may not be GC tracked.
However, in order to be able to do late GC root placement, we need to be able to identify a) which pointers are GC tracked and b) all uses of such pointers. The goal of the GC placement pass is thus simple:
Minimize the number of needed GC roots/stores to them subject to the constraint that at every safepoint, any live GC-tracked pointer (i.e. for which there is a path after this point that contains a use of this pointer) is in some GC slot.
The primary difficulty is thus choosing an IR representation that allows us to identify GC-tracked pointers and their uses, even after the program has been run through the optimizer. Our design makes use of three LLVM features to achieve this:
Custom address spaces allow us to tag every point with an integer that needs to be preserved through optimizations. The compiler may not insert casts between address spaces that did not exist in the original program and it must never change the address space of a pointer on a load/store/etc operation. This allows us to annotate which pointers are GC-tracked in an optimizer-resistant way. Note that metadata would not be able to achieve the same purpose. Metadata is supposed to always be discardable without altering the semantics of the program. However, failing to identify a GC-tracked pointer alters the resulting program behavior dramatically - it'll probably crash or return wrong results. We currently use three different address spaces (their numbers are defined insrc/codegen_shared.cpp
):
jl_value_t*
pointer on the C side. N.B. It is illegal to ever have a pointer in this address space that may not be stored to a GC slot.The GC root placement pass makes use of several invariants, which need to be observed by the frontend and are preserved by the optimizer.
First, only the following address space casts are allowed:
Now let us consider what constitutes a use:
We explicitly allow load/stores and simple calls in address spaces Tracked/Derived. Elements of jlcall argument arrays must always be in address space Tracked (it is required by the ABI that they are validjl_value_t*
pointers). The same is true for return instructions (though note that struct return arguments are allowed to have any of the address spaces). The only allowable use of an address space CalleeRooted pointer is to pass it to a call (which must have an appropriately typed operand).
Further, we disallowgetelementptr
in addrspace Tracked. This is because unless the operation is a noop, the resulting pointer will not be validly storable to a GC slot and may thus not be in this address space. If such a pointer is required, it should be decayed to addrspace Derived first.
Lastly, we disallowinttoptr
/ptrtoint
instructions in these address spaces. Having these instructions would mean that somei64
values are really GC tracked. This is problematic, because it breaks that stated requirement that we're able to identify GC-relevant pointers. This invariant is accomplished using the LLVM "non-integral pointers" feature, which is new in LLVM 5.0. It prohibits the optimizer from making optimizations that would introduce these operations. Note we can still insert static constants at JIT time by usinginttoptr
in address space 0 and then decaying to the appropriate address space afterwards.
ccall
One important aspect missing from the discussion so far is the handling ofccall
.ccall
has the peculiar feature that the location and scope of a use do not coincide. As an example consider:
A = randn(1024)ccall(:foo, Cvoid, (Ptr{Float64},), A)
In lowering, the compiler will insert a conversion from the array to the pointer which drops the reference to the array value. However, we of course need to make sure that the array does stay alive while we're doing theccall
. To understand how this is done, lets look at a hypothetical approximate possible lowering of the above code:
return $(Expr(:foreigncall, :(:foo), Cvoid, svec(Ptr{Float64}), 0, :(:ccall), Expr(:foreigncall, :(:jl_array_ptr), Ptr{Float64}, svec(Any), 0, :(:ccall), :(A)), :(A)))
The last:(A)
, is an extra argument list inserted during lowering that informs the code generator which Julia level values need to be kept alive for the duration of thisccall
. We then take this information and represent it in an "operand bundle" at the IR level. An operand bundle is essentially a fake use that is attached to the call site. At the IR level, this looks like so:
call void inttoptr (i64 ... to void (double*)*)(double* %5) [ "jl_roots"(%jl_value_t addrspace(10)* %A) ]
The GC root placement pass will treat thejl_roots
operand bundle as if it were a regular operand. However, as a final step, after the GC roots are inserted, it will drop the operand bundle to avoid confusing instruction selection.
pointer_from_objref
pointer_from_objref
is special because it requires the user to take explicit control of GC rooting. By our above invariants, this function is illegal, because it performs an address space cast from 10 to 0. However, it can be useful, in certain situations, so we provide a special intrinsic:
declared %jl_value_t *julia.pointer_from_objref(%jl_value_t addrspace(10)*)
which is lowered to the corresponding address space cast after GC root lowering. Do note however that by using this intrinsic, the caller assumes all responsibility for making sure that the value in question is rooted. Further this intrinsic is not considered a use, so the GC root placement pass will not provide a GC root for the function. As a result, the external rooting must be arranged while the value is still tracked by the system. I.e. it is not valid to attempt to use the result of this operation to establish a global root - the optimizer may have already dropped the value.
In certain cases it is necessary to keep an object alive, even though there is no compiler-visible use of said object. This may be case for low level code that operates on the memory-representation of an object directly or code that needs to interface with C code. In order to allow this, we provide the following intrinsics at the LLVM level:
token @llvm.julia.gc_preserve_begin(...)void @llvm.julia.gc_preserve_end(token)
(Thellvm.
in the name is required in order to be able to use thetoken
type). The semantics of these intrinsics are as follows: At any safepoint that is dominated by agc_preserve_begin
call, but that is not not dominated by a correspondinggc_preserve_end
call (i.e. a call whose argument is the token returned by agc_preserve_begin
call), the values passed as arguments to thatgc_preserve_begin
will be kept live. Note that thegc_preserve_begin
still counts as a regular use of those values, so the standard lifetime semantics will ensure that the values will be kept alive before entering the preserve region.
Settings
This document was generated withDocumenter.jl version 1.8.0 onWednesday 9 July 2025. Using Julia version 1.11.6.