- Notifications
You must be signed in to change notification settings - Fork34
Whole Program LLVM: wllvm ported to go
License
SRI-CSL/gllvm
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
TL; DR: A drop-in replacement forwllvm, that builds thebitcode in parallel, and is faster. A comparison between the two tools can be gleaned from building theLinux kernel.
wllvm command/env variable | gllvm command/env variable |
---|---|
wllvm | gclang |
wllvm++ | gclang++ |
wfortran | gflang |
extract-bc | get-bc |
wllvm-sanity-checker | gsanity-check |
LLVM_COMPILER_PATH | LLVM_COMPILER_PATH |
LLVM_CC_NAME ... | LLVM_CC_NAME ... |
LLVM_F_NAME | |
WLLVM_CONFIGURE_ONLY | WLLVM_CONFIGURE_ONLY |
WLLVM_OUTPUT_LEVEL | WLLVM_OUTPUT_LEVEL |
WLLVM_OUTPUT_FILE | WLLVM_OUTPUT_FILE |
LLVM_COMPILER | not supported (clang only) |
LLVM_GCC_PREFIX | not supported (clang only) |
LLVM_DRAGONEGG_PLUGIN | not supported (clang only) |
LLVM_LINK_FLAGS | LLVM_LINK_FLAGS |
This project,gllvm
, provides tools for building whole-program (orwhole-library) LLVM bitcode files from an unmodified C or C++source package. It currently runs on*nix
platforms such as Linux,FreeBSD, and Mac OS X. It is a Go port ofwllvm.
gllvm
provides compiler wrappers that work in twophases. The wrappers first invoke the compiler as normal. Then, foreach object file, they call a bitcode compiler to produce LLVMbitcode. The wrappers then store the location of the generated bitcodefile in a dedicated section of the object file. When object files arelinked together, the contents of the dedicated sections areconcatenated (so we don't lose the locations of any of the constituentbitcode files). After the build completes, one can use agllvm
utility to read the contents of the dedicated section and link all ofthe bitcode into a single whole-program bitcode file. This utilityworks for both executable and native libraries.
For more details seewllvm.
To installgllvm
you need the go languagetool.
To usegllvm
you need clang/clang++/flang and the llvm tools llvm-link and llvm-ar.gllvm
is agnostic to the actual llvm version.gllvm
also relies on standard buildtools such asobjcopy
andld
.
To install, simply do (making sure to include those...
)
go install github.com/SRI-CSL/gllvm/cmd/...@latest
This should install six binaries:gclang
,gclang++
,gflang
,get-bc
,gparse
, andgsanity-check
in the$GOPATH/bin
directory.
gclang
andgclang++
are the wrappers used to compile C and C++.gflang
is the wrapper used to compile Fortran.get-bc
is used forextracting the bitcode from a build product (either an object file, executable, libraryor archive).gsanity-check
can be used for detecting configuration errors.gparse
can be used to examine howgllvm
parses compiler/linker lines.
Here is a simple example. Assuming that clang is in yourPATH
, you can buildbitcode forpkg-config
as follows:
tar xf pkg-config-0.26.tar.gzcd pkg-config-0.26CC=gclang ./configuremake
This should produce the executablepkg-config
. To extract the bitcode:
get-bc pkg-config
which will produce the bitcode modulepkg-config.bc
. For more on this exampleseehere.
If clang and the llvm tools are not in yourPATH
, you will need to set someenvironment variables.
LLVM_COMPILER_PATH
can be set to the absolute path of the directory thatcontains the compiler and the other LLVM tools to be used.LLVM_CC_NAME
can be set if your clang compiler is not calledclang
butsomething likeclang-3.7
. SimilarlyLLVM_CXX_NAME
andLLVM_F_NAME
can be used todescribe what the C++ and Fortran compilers are called, respectively. We also pay attention to theenvironment variablesLLVM_LINK_NAME
andLLVM_AR_NAME
in ananalogous way.
Another useful, and sometimes necessary, environment variable isWLLVM_CONFIGURE_ONLY
.
WLLVM_CONFIGURE_ONLY
can be set to anything. If it is set,gclang
andgclang++
behave like a normal C or C++ compiler. They do notproduce bitcode. SettingWLLVM_CONFIGURE_ONLY
may preventconfiguration errors caused by the unexpected production of hiddenbitcode files. It is sometimes required when configuring a build.For example:WLLVM_CONFIGURE_ONLY=1 CC=gclang ./configuremake
Theget-bc
tool is used to extract the bitcode from a build artifact, such as an executable, object file, thin archive, archive, or library. In the simplest use case, as seen above,one simply does:
get-bc -o <name of bitcode file> <path to executable>
This will produce the desired bitcode file. The situation is similar for an object file.For an archive or library, there is a choice as to whether you produce a bitcode moduleor a bitcode archive. This choice is made by using the-b
switch.
Another useful switch is the-m
switch which will, in addition to producing thebitcode, will also produce a manifest of the bitcode filesthat made up the final product. As is typical
get-bc -h
will list all the commandline switches. Since we use thegolang
flag
module,the switches must precede the artifact path.
Sometimes, because of pathological build systems, it can be usefulto preserve the bitcode files produced in abuild, either to prevent deletion or to retrieve it later. If theenvironment variableWLLVM_BC_STORE
is set to the absolute path ofan existing directory,then WLLVM will copy the produced bitcode file into that directory.The name of the copied bitcode file is the hash of the path to theoriginal bitcode file. For convenience, when using both the manifestfeature ofget-bc
and the store, the manifest will contain boththe original path, and the store path.
The gllvm tools can show various levels of output to aid with debugging.To show this output set theWLLVM_OUTPUT_LEVEL
environmentvariable to one of the following levels:
ERROR
WARNING
AUDIT
INFO
DEBUG
For example:
export WLLVM_OUTPUT_LEVEL=DEBUG
Output will be directed to the standard error stream, unless you specify thepath of a logfile via theWLLVM_OUTPUT_FILE
environment variable.TheAUDIT
level, new in 2022, logs only the calls to the compiler, and indicateswhether each call iscompiling orlinking, the compiler used, and the arguments provided.
For example:
export WLLVM_OUTPUT_FILE=/tmp/gllvm.log
gllvm
does not support the dragonegg plugin.
Too many environment variables? Try doing a sanity check:
gsanity-check
it might point out what is wrong.
Bothwllvm
andgllvm
toolsets do much the same thing, but the waythey do it is slightly different. Thegllvm
toolset's code base iswritten ingolang
, and is largely derived from thewllvm
's pythoncodebase.
Both generate object files and bitcode files using thecompiler.wllvm
can usegcc
anddragonegg
,gllvm
can only useclang
. Thegllvm
toolset does these two tasks in parallel, whilewllvm
does them sequentially. This together with the slowness ofpython'sfork exec
-ing, and it's interpreted nature accounts for thelarge efficiency gap between the two toolsets.
Both inject the path of the bitcode version of the.o
file into adedicated segment of the.o
file itself. This segment is the sameacross toolsets, so extracting the bitcode can be done by theappropriate tool in either toolset. On*nix
both toolsets useobjcopy
to add the segment, while on OS X they useld
.
When the object files are linked into the resulting library orexecutable, the bitcode path segments are appended, so the resultingbinary contains the paths of all the bitcode files that constitute thebinary. To extract the sections thegllvm
toolset uses the golangpackages"debug/elf"
and"debug/macho"
, while thewllvm
toolsetusesobjdump
on*nix
, andotool
on OS X.
Both tools then usellvm-link
orllvm-ar
to combine the bitcodefiles into the desired form.
You can specify the exact version ofobjcopy
andld
thatgllvm
usesto manipulate the artifacts by setting theGLLVM_OBJCOPY
andGLLVM_LD
environment variables. For more details of what's under thegllvm
hood, try
gsanity-check -e
In some situations it is desirable to pass certain flags toclang
in the step thatproduces the bitcode. This can be fulfilled by setting theLLVM_BITCODE_GENERATION_FLAGS
environment variable to the desiredflags, for example"-flto -fwhole-program-vtables"
.
In other situations it is desirable to pass certain flags tollvm-link
in the stepthat merges multiple individual bitcode files together (i.e., withinget-bc
).This can be fulfilled by setting theLLVM_LINK_FLAGS
environment variable tothe desired flags, for example"-internalize -only-needed"
.
If the package you are building happens to take advantage of recentclang
developmentssuch aslink time optimization (indicated by the presence of compiler flag-flto
), thenyour build is unlikely to produce anything thatget-bc
will work on. This is to beexpected. When working under these flags, the compiler actually produces object files that are bitcode,your only recourse here is to try and save these object files, and retrieve them yourself.This can be done by setting theLTO_LINKING_FLAGS
to be something like"-g -Wl,-plugin-opt=save-temps"
which will be appended to the flags at link time.This will at least preserve the bitcode files, even ifget-bc
will not be able to retrieve them for you.
When cross-compiling a project (i.e. you pass the--target=
or-target
flag to the compiler),you'll need to set theGLLVM_OBJCOPY
variable to either
llvm-objcopy
to use LLVM's objcopy, which naturally supports all targets that clang does.YOUR-TARGET-TRIPLE-objcopy
to use GNU's objcopy, sinceobjcopy
only supports the native architecture.
Example:
# test programecho'int main() { return 0; }'> a.c clang --target=aarch64-linux-gnu a.c# worksgclang --target=aarch64-linux-gnu a.c# breaksGLLVM_OBJCOPY=llvm-objcopy gclang --target=aarch64-linux-gnu a.c# worksGLLVM_OBJCOPY=aarch64-linux-gnu-objcopy gclang --target=aarch64-linux-gnu a.c# works if you have GNU's arm64 toolchain
Debugging usually boils down to looking in the logs, maybe adding a print statement or two.There is an additional executable, not mentioned above, calledgparse
that gets installedalong withgclang
,gclang++
,gflang
,get-bc
andgsanity-check
.gparse
takes the command linearguments to the compiler, and outputs how it parsed them. This can sometimes be helpful.
gllvm
is released under a BSD license. See the fileLICENSE
fordetails.
This material is based upon work supported by the National ScienceFoundation under GrantACI-1440800. Anyopinions, findings, and conclusions or recommendations expressed inthis material are those of the author(s) and do not necessarilyreflect the views of the National Science Foundation.
About
Whole Program LLVM: wllvm ported to go