Movatterモバイル変換

LLVM

From Wikipedia, the free encyclopedia

Compiler backend for multiple programming languages

LLVM
The LLVM logo, a stylizedwyvern^[1]
Original authors	Chris Lattner,Vikram Adve
Developer	LLVM Developer Group
Initial release	2003; 23 years ago (2003)

Stable release	21.1.8^[2] / 16 December 2025

Written in	C++
Operating system	Cross-platform
Type	Compiler
License	Apache License 2.0 with LLVM Exceptions (v9.0.0 or later)^[3] Legacy license:^[4]UIUC (BSD-style)
Website	www.llvm.org
Repository	github.com/llvm/llvm-project

LLVM is a set ofcompiler andtoolchain technologies^[5] that can be used to develop afrontend for anyprogramming language and abackend for anyinstruction set architecture. LLVM is designed around alanguage-independent intermediate representation (IR) that serves as aportable, high-levelassembly language that can beoptimized with a variety of transformations over multiple passes.^[6] The nameLLVM originally stood forLow Level Virtual Machine. However, the project has since expanded, and the name is no longer an acronym but anorphan initialism.^[7]

LLVM is written inC++ and is designed forcompile-time,link-time, andruntime optimization. Originally implemented forC and C++, the language-agnostic design of LLVM has since spawned a wide variety of frontends: languages with compilers that use LLVM (or which do not directly use LLVM but can generate compiled programs as LLVM IR) includeActionScript,Ada,C# for.NET,^[8]^[9]^[10]Common Lisp,^[11]Crystal,CUDA,D,^[12]Delphi,^[13]Dylan,Forth,^[14]Fortran,^[15]FreeBASIC,Free Pascal,Halide,Haskell,Idris,^[16]Jai (only for optimized release builds),Java bytecode,Julia,Kotlin,LabVIEW's G language,^[17]^[18]Objective-C,OpenCL,^[19] Odin,^[20]PicoLisp,PostgreSQL's SQL andPL/pgSQL,^[21]Ruby,^[22]Rust,^[23]Scala,^[24]^[25]Standard ML,^[26]Swift,Wolfram Language^[27],Xojo, andZig.

History

[edit]

The LLVM project started in 2000 at theUniversity of Illinois at Urbana–Champaign, under the direction ofVikram Adve andChris Lattner. LLVM was originally developed as a research infrastructure to investigatedynamic compilation techniques for static anddynamic programming languages. LLVM was released under theUniversity of Illinois/NCSA Open Source License,^[3] apermissive free software licence. In 2005,Apple Inc. hired Lattner and formed a team to work on the LLVM system for various uses within Apple's development systems.^[28] LLVM has been an integral part of Apple'sXcode development tools formacOS andiOS since Xcode 4 in 2011.^[29]

In 2006, Lattner started working on a new project namedClang. The combination of the Clang frontend and LLVM backend is named Clang/LLVM or simply Clang.

The nameLLVM was originally aninitialism forLow Level Virtual Machine. However, the LLVM project evolved into an umbrella project that has little relationship to what most current developers think of as avirtual machine. This made the initialism "confusing" and "inappropriate", and since 2011 LLVM is "officially no longer an acronym",^[30] but a brand that applies to the LLVM umbrella project.^[31] The project encompasses the LLVMintermediate representation (IR), the LLVMdebugger, the LLVM implementation of theC++ Standard Library (with full support ofC++11 andC++14^[32]), etc. LLVM is administered by the LLVM Foundation. Compiler engineer Tanya Lattner became its president in 2014^[33] and was still president and Executive Director as of November 2025^[update].^[34]

"For designing and implementing LLVM", theAssociation for Computing Machinery presented Vikram Adve, Chris Lattner, and Evan Cheng with the 2012ACM Software System Award.^[35]

The project was originally available under theUIUC license. After v9.0.0 released in 2019,^[36] LLVM relicensed to theApache License 2.0 with LLVM Exceptions.^[3] As of November 2022^[update] about 400 contributions had not been relicensed.^[37]^[38]

Features

[edit]

LLVM can provide the middle layers of a complete compiler system, takingintermediate representation (IR) code from acompiler and emitting an optimized IR. This new IR can then be converted and linked into machine-dependentassembly language code for a target platform. LLVM can accept the IR from theGNU Compiler Collection (GCC)toolchain, allowing it to be used with a wide array of extant compiler front-ends written for that project. LLVM can also be built with gcc after version 7.5.^[39]

LLVM can also generaterelocatable machine code at compile-time or link-time or even binary machine code at runtime.

LLVM supports a language-independentinstruction set andtype system.^[6] Each instruction is instatic single assignment form (SSA), meaning that eachvariable (called a typed register) is assigned once and then frozen. This helps simplify the analysis of dependencies among variables. LLVM allows code to be compiled statically, as it is under the traditional GCC system, or left for late-compiling from the IR to machine code viajust-in-time compilation (JIT), similar toJava. The type system consists of basic types such asinteger orfloating-point numbers and fivederived types:pointers,arrays,vectors,structures, andfunctions. A type construct in a concrete language can be represented by combining these basic types in LLVM. For example, a class in C++ can be represented by a mix of structures, functions and arrays offunction pointers.

The LLVM JIT compiler can optimize unneeded static branches out of a program at runtime, and thus is useful forpartial evaluation in cases where a program has many options, most of which can easily be determined unneeded in a specific environment. This feature is used in theOpenGL pipeline ofMac OS X Leopard (v10.5) to provide support for missing hardware features.^[40]

Graphics code within the OpenGL stack can be left in intermediate representation and then compiled when run on the target machine. On systems with high-endgraphics processing units (GPUs), the resulting code remains quite thin, passing the instructions on to the GPU with minimal changes. On systems with low-end GPUs, LLVM will compile optional procedures that run on the localcentral processing unit (CPU) that emulate instructions that the GPU cannot run internally. LLVM improved performance on low-end machines usingIntel GMA chipsets. A similar system was developed under theGallium3D LLVMpipe, and incorporated into theGNOME shell to allow it to run without a proper 3D hardware driver loaded.^[41]

In 2011, programs compiled by GCC outperformed those from LLVM by 10%, on average.^[42]^[43]In 2013,phoronix reported that LLVM had caught up with GCC, compiling binaries of approximately equal performance.^[44]

Components

[edit]

LLVM has become an umbrella project containing multiple components.

Frontends

[edit]

LLVM was originally written to be a replacement for the extantcode generator in the GCC stack,^[45] and many of the GCC frontends were modified to work with it, resulting in the now-defunct LLVM-GCC suite. The modifications generally involved aGIMPLE-to-LLVM IR step so that LLVM optimizers and codegen could be used instead of GCC's GIMPLE system. Apple was a significant user of LLVM-GCC throughXcode 4.x (2013).^[46]^[47] This use of the GCC frontend was considered a temporary measure which became mostly obsolete with the advent of LLVM/Clang's more modern, modular codebase and compilation speed.

LLVM currently^{[as of?]} supports compiling ofAda,C,C++,D,Delphi,Fortran,Haskell,Julia,Objective-C,Rust, andSwift using variousfrontends.

Widespread interest in LLVM has led to several efforts to develop new frontends for many languages. One such frontend is Clang, a newer compiler supporting C, C++, and Objective-C. Primarily supported by Apple, Clang is aimed at replacing the C/Objective-C compiler in the GCC system with a system that is more easily integrated withintegrated development environments (IDEs) and has wider support formultithreading. Support forOpenMP directives has been included inClang since release 3.8.^[48]

TheUtrecht Haskell compiler can generate code for LLVM. While the generator was in early stages of development, in many cases it was more efficient than the C code generator.^[49] TheGlasgow Haskell Compiler (GHC) backend uses LLVM and achieves a 30% speed-up of compiled code relative to native code compiling via GHC or C code generation followed by compiling, missing only one of the many optimizing techniques implemented by the GHC.^[50]

Many other components are in various stages of development, including, but not limited to, theRust compiler, aJava bytecode frontend, aCommon Intermediate Language (CIL) frontend, theMacRuby implementation of Ruby 1.9, various frontends forStandard ML, and a newgraph coloring register allocator.^{[citation needed]}

Intermediate representation

[edit]

LLVM IR is used e.g., by radeonsi and by llvmpipe. Both are part ofMesa 3D.

The core of LLVM is theintermediate representation (IR), alow-level programming language similar to assembly. IR is a strongly typedreduced instruction set computer (RISC) instruction set which abstracts away most details of the target. For example, thecalling convention is abstracted throughcall andret instructions with explicit arguments. Also, instead of a fixed set of registers, IR uses an infinite set of temporaries of the form %0, %1, etc. LLVM supports three equivalent forms of IR: a human-readable assembly format,^[51] an in-memory format suitable for frontends, and a dense bitcode format for serializing. A simple"Hello, world!" program in the human-readable IR format:

@.str=internalconstant[14xi8]c"Hello, world\0A\00"declarei32@printf(ptr,...)definei32@main(i32%argc,ptr%argv)nounwind{entry:%tmp1=getelementptr[14xi8],ptr@.str,i320,i320%tmp2=calli32(ptr,...)@printf(ptr%tmp1)nounwindreti320}

The many different conventions used and features provided by different targets mean that LLVM cannot truly produce a target-independent IR and retarget it without breaking some established rules. Examples of target dependence beyond what is explicitly mentioned in the documentation can be found in a 2011 proposal for "wordcode", a fully target-independent variant of LLVM IR intended for online distribution.^[52] A more practical example isPNaCl.^[53]

The LLVM project also introduces another type of intermediate representation namedMLIR^[54] which helps build reusable and extensible compiler infrastructure by employing a plugin architecture named Dialect.^[55] It enables the use of higher-level information on the program structure in the process of optimization includingpolyhedral compilation.

Backends

[edit]

At version 16, LLVM supports manyinstruction sets, includingIA-32,x86-64,ARM,Qualcomm Hexagon,LoongArch,M68K,MIPS,NVIDIA Parallel Thread Execution (PTX, also namedNVPTX in LLVM documentation),PowerPC,AMD TeraScale,^[56] most recentAMD GPUs (also namedAMDGPU in LLVM documentation),^[57]SPARC,z/Architecture (also namedSystemZ in LLVM documentation), andXCore.

Some features are not available on some platforms. Most features are present for IA-32, x86-64, z/Architecture, ARM, and PowerPC.^[58]RISC-V is supported as of version 7.

In the past, LLVM also supported other backends, fully or partially, including C backend,Cell SPU,mblaze (MicroBlaze),^[59] AMD R600, DEC/CompaqAlpha (Alpha AXP)^[60] andNios2,^[61] but that hardware is mostly obsolete, and LLVM developers decided the support and maintenance costs were no longer justified.^{[citation needed]}

LLVM also supportsWebAssembly as a target, enabling compiled programs to execute in WebAssembly-enabled environments such asGoogle Chrome /Chromium,Firefox,Microsoft Edge,Apple Safari orWAVM. LLVM-compliant WebAssembly compilers typically support mostly unmodified source code written in C, C++, D, Rust, Nim, Kotlin and several other languages.

The LLVM machine code (MC) subproject is LLVM's framework for translating machine instructions between textual forms and machine code. Formerly, LLVM relied on the system assembler, or one provided by a toolchain, to translate assembly into machine code. LLVM MC's integrated assembler supports most LLVM targets, including IA-32, x86-64, ARM, and ARM64. For some targets, including the various MIPS instruction sets, integrated assembly support is usable but still in the beta stage.^{[citation needed]}

Linker

[edit]

The lld subproject is an attempt to develop a built-in, platform-independentlinker for LLVM.^[62] lld aims to remove dependence on a third-party linker. As of May 2017^[update], lld supportsELF,PE/COFF,Mach-O, andWebAssembly^[63] in descending order of completeness. lld is faster than both flavors ofGNU ld.^{[citation needed]}

Unlike the GNU linkers, lld has built-in support forlink-time optimization (LTO). This allows for faster code generation as it bypasses the use of a linker plugin, but on the other hand prohibits interoperability with other flavors of LTO.^[64]

C++ Standard Library

[edit]

The LLVM project includes an implementation of theC++ Standard Library named libc++, dual-licensed under theMIT License and theUIUC license.^[65]

Since v9.0.0, it was relicensed to theApache License 2.0 with LLVM Exceptions.^[3]

Polly

[edit]

This implements a suite of cache-locality optimizations as well as auto-parallelism andvectorization using apolyhedral model.^[66]

Debugger

[edit]

Main article:LLDB (debugger)

C Standard Library

[edit]

llvm-libc is an incomplete, upcoming, ABI independentC standard library designed by and for the LLVM project.^[67]

Derivatives

[edit]

Due to its permissive license, many vendors release their own tuned forks of LLVM. This is officially recognized by LLVM's documentation, which suggests against using version numbers in feature checks for this reason.^[68] Some of the vendors include:

AMD'sAMD Optimizing C/C++ Compiler is based on LLVM, Clang, and Flang.
Apple maintains an open-source fork forXcode.^[69]
Arm provides a number of LLVM based toolchains, including Arm Compiler for Embedded targeting bare-metal development and Arm Compiler for Linux targeting the High Performance Computing market
Flang, Fortran project in development as of 2022^[update]
IBM is adopting LLVM in itsC/C++ andFortran compilers.^[70]
Intel has adopted LLVM for their next generationIntel C++ Compiler.^[71]
TheLos Alamos National Laboratory has a parallel-computing fork of LLVM 8 named "Kitsune".^[72]
Nvidia uses LLVM in the implementation of its NVVMCUDA Compiler.^[73] The NVVM compiler is distinct from the "NVPTX" backend mentioned in theBackends section, although both generate PTX code for Nvidia GPUs.
Since 2013, Sony has been using LLVM's primary front-end Clang compiler in thesoftware development kit (SDK) of itsPlayStation 4 console.^[74]