LLVM Language Reference Manual

Abstract

This document is a reference manual for the LLVM assembly language. LLVMis a Static Single Assignment (SSA) based representation that providestype safety, low-level operations, flexibility, and the capability ofrepresenting ‘all’ high-level languages cleanly. It is the common coderepresentation used throughout all phases of the LLVM compilationstrategy.

Introduction

The LLVM code representation is designed to be used in three differentforms: as an in-memory compiler IR, as an on-disk bitcode representation(suitable for fast loading by a Just-In-Time compiler), and as a humanreadable assembly language representation. This allows LLVM to provide apowerful intermediate representation for efficient compilertransformations and analysis, while providing a natural means to debugand visualize the transformations. The three different forms of LLVM areall equivalent. This document describes the human readablerepresentation and notation.

The LLVM representation aims to be light-weight and low-level whilebeing expressive, typed, and extensible at the same time. It aims to bea “universal IR” of sorts, by being at a low enough level thathigh-level ideas may be cleanly mapped to it (similar to howmicroprocessors are “universal IR’s”, allowing many source languages tobe mapped to them). By providing type information, LLVM can be used asthe target of optimizations: for example, through pointer analysis, itcan be proven that a C automatic variable is never accessed outside ofthe current function, allowing it to be promoted to a simple SSA valueinstead of a memory location.

Well-Formedness

It is important to note that this document describes ‘well formed’ LLVMassembly language. There is a difference between what the parser acceptsand what is considered ‘well formed’. For example, the followinginstruction is syntactically okay, but not well formed:

%x=addi321,%x

because the definition of%x does not dominate all of its uses. TheLLVM infrastructure provides a verification pass that may be used toverify that an LLVM module is well formed. This pass is automaticallyrun by the parser after parsing input assembly and by the optimizerbefore it outputs bitcode. The violations pointed out by the verifierpass indicate bugs in transformation passes or input to the parser.

Syntax

Identifiers

LLVM identifiers come in two basic types: global and local. Globalidentifiers (functions, global variables) begin with the'@'character. Local identifiers (register names, types) begin with the'%' character. Additionally, there are three different formats foridentifiers, for different purposes:

  1. Named values are represented as a string of characters with theirprefix. For example,%foo,@DivisionByZero,%a.really.long.identifier. The actual regular expression used is‘[%@][-a-zA-Z$._][-a-zA-Z$._0-9]*’. Identifiers that require othercharacters in their names can be surrounded with quotes. Specialcharacters may be escaped using"\xx" wherexx is the ASCIIcode for the character in hexadecimal. In this way, any character canbe used in a name value, even quotes themselves. The"\01" prefixcan be used on global values to suppress mangling.

  2. Unnamed values are represented as an unsigned numeric value withtheir prefix. For example,%12,@2,%44.

  3. Constants, which are described in the sectionConstants below.

LLVM requires that values start with a prefix for two reasons: Compilersdon’t need to worry about name clashes with reserved words, and the setof reserved words may be expanded in the future without penalty.Additionally, unnamed identifiers allow a compiler to quickly come upwith a temporary variable without having to avoid symbol tableconflicts.

Reserved words in LLVM are very similar to reserved words in otherlanguages. There are keywords for different opcodes (’add’,‘bitcast’, ‘ret’, etc…), for primitive type names (’void’,‘i32’, etc…), and others. These reserved words cannot conflictwith variable names, because none of them start with a prefix character('%' or'@').

Here is an example of LLVM code to multiply the integer variable‘%X’ by 8:

The easy way:

%result=muli32%X,8

After strength reduction:

%result=shli32%X,3

And the hard way:

%0=addi32%X,%X; yields i32:%0%1=addi32%0,%0/*yieldsi32:%1*/%result=addi32%1,%1

This last way of multiplying%X by 8 illustrates several importantlexical features of LLVM:

  1. Comments are delimited with a ‘;’ and go until the end of line.Alternatively, comments can start with/* and terminate with*/.

  2. Unnamed temporaries are created when the result of a computation isnot assigned to a named value.

  3. By default, unnamed temporaries are numbered sequentially (using aper-function incrementing counter, starting with 0). However, when explicitlyspecifying temporary numbers, it is allowed to skip over numbers.

    Note that basic blocks and unnamed function parameters are included in thisnumbering. For example, if the entry basic block is not given a label nameand all function parameters are named, then it will get number 0.

It also shows a convention that we follow in this document. Whendemonstrating instructions, we will follow an instruction with a commentthat defines the type and name of value produced.

String constants

Strings in LLVM programs are delimited by" characters. Within astring, all bytes are treated literally with the exception of\characters, which start escapes, and the first" character, whichends the string.

There are two kinds of escapes.

  • \\ represents a single\ character.

  • \ followed by two hexadecimal characters (0-9, a-f, or A-F)represents the byte with the given value (e.g.\00 represents anull byte).

To represent a" character, use\22. (\" will end the stringwith a trailing\.)

Newlines do not terminate string constants; strings can span multiplelines.

The interpretation of string constants (e.g. their character encoding)depends on context.

High Level Structure

Module Structure

LLVM programs are composed ofModule’s, each of which is atranslation unit of the input programs. Each module consists offunctions, global variables, and symbol table entries. Modules may becombined together with the LLVM linker, which merges function (andglobal variable) definitions, resolves forward declarations, and mergessymbol table entries. Here is an example of the “hello world” module:

; Declare the string constant as a global constant.@.str=privateunnamed_addrconstant[13xi8]c"hello world\0A\00"; External declaration of the puts functiondeclarei32@puts(ptrcaptures(none))nounwind; Definition of main functiondefinei32@main(){; Call puts function to write out the string to stdout.calli32@puts(ptr@.str)reti320}; Named metadata!0=!{i3242,null,!"string"}!foo=!{!0}

This example is made up of aglobal variable named“.str”, an external declaration of the “puts” function, afunction definition for “main” andnamed metadatafoo”.

In general, a module is made up of a list of global values (where bothfunctions and global variables are global values). Global values arerepresented by a pointer to a memory location (in this case, a pointerto an array of char, and a pointer to a function), and have one of thefollowinglinkage types.

Linkage Types

All Global Variables and Functions have one of the following types oflinkage:

private

Global values with “private” linkage are only directlyaccessible by objects in the current module. In particular, linkingcode into a module with a private global value may cause theprivate to be renamed as necessary to avoid collisions. Because thesymbol is private to the module, all references can be updated. Thisdoesn’t show up in any symbol table in the object file.

internal

Similar to private, but the value shows as a local symbol(STB_LOCAL in the case of ELF) in the object file. Thiscorresponds to the notion of the ‘static’ keyword in C.

available_externally

Globals with “available_externally” linkage are never emitted intothe object file corresponding to the LLVM module. From the linker’sperspective, anavailable_externally global is equivalent toan external declaration. They exist to allow inlining and otheroptimizations to take place given knowledge of the definition of theglobal, which is known to be somewhere outside the module. Globalswithavailable_externally linkage are allowed to be discarded atwill, and allow inlining and other optimizations. This linkage type isonly allowed on definitions, not declarations.

linkonce

Globals with “linkonce” linkage are merged with other globals ofthe same name when linkage occurs. This can be used to implementsome forms of inline functions, templates, or other code which mustbe generated in each translation unit that uses it, but where thebody may be overridden with a more definitive definition later.Unreferencedlinkonce globals are allowed to be discarded. Notethatlinkonce linkage does not actually allow the optimizer toinline the body of this function into callers because it doesn’tknow if this definition of the function is the definitive definitionwithin the program or whether it will be overridden by a strongerdefinition. To enable inlining and other optimizations, use“linkonce_odr” linkage.

weak

weak” linkage has the same merging semantics aslinkoncelinkage, except that unreferenced globals withweak linkage maynot be discarded. This is used for globals that are declared “weak”in C source code.

common

common” linkage is most similar to “weak” linkage, but theyare used for tentative definitions in C, such as “intX;” atglobal scope. Symbols with “common” linkage are merged in thesame way asweaksymbols, and they may not be deleted ifunreferenced.common symbols may not have an explicit section,must have a zero initializer, and may not be marked‘constant’. Functions and aliases may not havecommon linkage.

appending

appending” linkage may only be applied to global variables ofpointer to array type. When two global variables with appendinglinkage are linked together, the two global arrays are appendedtogether. This is the LLVM, typesafe, equivalent of having thesystem linker append together “sections” with identical names when.o files are linked.

Unfortunately this doesn’t correspond to any feature in .o files, so itcan only be used for variables likellvm.global_ctors which llvminterprets specially.

extern_weak

The semantics of this linkage follow the ELF object file model: thesymbol is weak until linked, if not linked, the symbol becomes nullinstead of being an undefined reference.

linkonce_odr,weak_odr

Theodr suffix indicates that all globals defined with the given nameare equivalent, along the lines of the C++ “one definition rule” (“ODR”).Informally, this means we can inline functions and fold loads of constants.

Formally, use the following definition: when anodr function iscalled, one of the definitions is non-deterministically chosen to run. Forodr variables, if any byte in the value is not equal in allinitializers, that byte is apoison value. Foraliases and ifuncs, apply the rule for the underlying function or variable.

These linkage types are otherwise the same as their non-odr versions.

external

If none of the above identifiers are used, the global is externallyvisible, meaning that it participates in linkage and can be used toresolve external symbol references.

It is illegal for a global variable or functiondeclaration to have anylinkage type other thanexternal orextern_weak.

Calling Conventions

LLVMfunctions,calls andinvokes can all have an optional calling conventionspecified for the call. The calling convention of any pair of dynamiccaller/callee must match, or the behavior of the program is undefined.The following calling conventions are supported by LLVM, and more may beadded in the future:

ccc” - The C calling convention

This calling convention (the default if no other calling conventionis specified) matches the target C calling conventions. This callingconvention supports varargs function calls and tolerates somemismatch in the declared prototype and implemented declaration ofthe function (as does normal C).

fastcc” - The fast calling convention

This calling convention attempts to make calls as fast as possible(e.g. by passing things in registers). This calling conventionallows the target to use whatever tricks it wants to produce fastcode for the target, without having to conform to an externallyspecified ABI (Application Binary Interface).Tail calls can onlybe optimized when this, the tailcc, the GHC or the HiPE convention isused. This callingconvention does not support varargs and requires the prototype of allcallees to exactly match the prototype of the function definition.

coldcc” - The cold calling convention

This calling convention attempts to make code in the caller asefficient as possible under the assumption that the call is notcommonly executed. As such, these calls often preserve all registersso that the call does not break any live ranges in the caller side.This calling convention does not support varargs and requires theprototype of all callees to exactly match the prototype of thefunction definition. Furthermore the inliner doesn’t consider such functioncalls for inlining.

ghccc” - GHC convention

This calling convention has been implemented specifically for use bytheGlasgow Haskell Compiler (GHC).It passes everything in registers, going to extremes to achieve thisby disabling callee save registers. This calling convention shouldnot be used lightly but only for specific situations such as analternative to theregister pinning performance technique oftenused when implementing functional programming languages. At themoment only X86, AArch64, and RISCV support this convention. Thefollowing limitations exist:

  • OnX86-32 only up to 4 bit type parameters are supported. Nofloating-point types are supported.

  • OnX86-64 only up to 10 bit type parameters and 6floating-point parameters are supported.

  • OnAArch64 only up to 4 32-bit floating-point parameters,4 64-bit floating-point parameters, and 10 bit type parametersare supported.

  • RISCV64 only supports up to 11 bit type parameters, 432-bit floating-point parameters, and 4 64-bit floating-pointparameters.

This calling convention supportstail calloptimization but requiresboth the caller and callee are using it.

cc11” - The HiPE calling convention

This calling convention has been implemented specifically for use bytheHigh-Performance Erlang(HiPE) compiler,thenative code compiler of theEricsson’s Open Source Erlang/OTPsystem. It uses moreregisters for argument passing than the ordinary C callingconvention and defines no callee-saved registers. The callingconvention properly supportstail calloptimization but requiresthat both the caller and the callee use it. It uses aregister pinningmechanism, similar to GHC’s convention, for keeping frequentlyaccessed runtime components pinned to specific hardware registers.At the moment only X86 supports this convention (both 32 and 64bit).

anyregcc” - Dynamic calling convention for code patching

This is a special convention that supports patching an arbitrary codesequence in place of a call site. This convention forces the callarguments into registers but allows them to be dynamicallyallocated. This can currently only be used with calls tollvm.experimental.patchpoint because only this intrinsic recordsthe location of its arguments in a side table. SeeStack maps and patch points in LLVM.

preserve_mostcc” - ThePreserveMost calling convention

This calling convention attempts to make the code in the caller asunintrusive as possible. This convention behaves identically to theCcalling convention on how arguments and return values are passed, but ituses a different set of caller/callee-saved registers. This alleviates theburden of saving and recovering a large register set before and after thecall in the caller. If the arguments are passed in callee-saved registers,then they will be preserved by the callee across the call. This doesn’tapply for values returned in callee-saved registers.

  • On X86-64 the callee preserves all general purpose registers, except forR11 and return registers, if any. R11 can be used as a scratch register.The treatment of floating-point registers (XMMs/YMMs) matches the OS’s Ccalling convention: on most platforms, they are not preserved and need tobe saved by the caller, but on Windows, xmm6-xmm15 are preserved.

  • On AArch64 the callee preserve all general purpose registers, exceptX0-X8 and X16-X18. Not allowed withnest.

The idea behind this convention is to support calls to runtime functionsthat have a hot path and a cold path. The hot path is usually a small pieceof code that doesn’t use many registers. The cold path might need to call out toanother function and therefore only needs to preserve the caller-savedregisters, which haven’t already been saved by the caller. ThePreserveMost calling convention is very similar to thecold callingconvention in terms of caller/callee-saved registers, but they are used fordifferent types of function calls.coldcc is for function calls that arerarely executed, whereaspreserve_mostcc function calls are intended to beon the hot path and definitely executed a lot. Furthermorepreserve_mostccdoesn’t prevent the inliner from inlining the function call.

This calling convention will be used by a future version of the ObjectiveCruntime and should therefore still be considered experimental at this time.Although this convention was created to optimize certain runtime calls tothe ObjectiveC runtime, it is not limited to this runtime and might be usedby other runtimes in the future too. The current implementation onlysupports X86-64, but the intention is to support more architectures in thefuture.

preserve_allcc” - ThePreserveAll calling convention

This calling convention attempts to make the code in the caller even lessintrusive than thePreserveMost calling convention. This callingconvention also behaves identical to theC calling convention on howarguments and return values are passed, but it uses a different set ofcaller/callee-saved registers. This removes the burden of saving andrecovering a large register set before and after the call in the caller. Ifthe arguments are passed in callee-saved registers, then they will bepreserved by the callee across the call. This doesn’t apply for valuesreturned in callee-saved registers.

  • On X86-64 the callee preserves all general purpose registers, except forR11. R11 can be used as a scratch register. Furthermore it also preservesall floating-point registers (XMMs/YMMs).

  • On AArch64 the callee preserve all general purpose registers, exceptX0-X8 and X16-X18. Furthermore it also preserves lower 128 bits of V8-V31SIMD floating point registers. Not allowed withnest.

The idea behind this convention is to support calls to runtime functionsthat don’t need to call out to any other functions.

This calling convention, like thePreserveMost calling convention, will beused by a future version of the ObjectiveC runtime and should be consideredexperimental at this time.

preserve_nonecc” - ThePreserveNone calling convention

This calling convention doesn’t preserve any general registers. So allgeneral registers are caller saved registers. It also uses all generalregisters to pass arguments. This attribute doesn’t impact non-generalpurpose registers (e.g. floating point registers, on X86 XMMs/YMMs).Non-general purpose registers still follow the standard c callingconvention. Currently it is for x86_64 and AArch64 only.

cxx_fast_tlscc” - TheCXX_FAST_TLS calling convention for access functions

Clang generates an access function to access C++-style Thread Local Storage(TLS). The access function generally has an entry block, an exit block and aninitialization block that is run at the first time. The entry and exit blockscan access a few TLS IR variables, each access will be lowered to aplatform-specific sequence.

This calling convention aims to minimize overhead in the caller bypreserving as many registers as possible (all the registers that arepreserved on the fast path, composed of the entry and exit blocks).

This calling convention behaves identical to theC calling convention onhow arguments and return values are passed, but it uses a different set ofcaller/callee-saved registers.

Given that each platform has its own lowering sequence, hence its own setof preserved registers, we can’t use the existingPreserveMost.

  • On X86-64 the callee preserves all general purpose registers, except forRDI and RAX.

tailcc” - Tail callable calling convention

This calling convention ensures that calls in tail position will always betail call optimized. This calling convention is equivalent to fastcc,except for an additional guarantee that tail calls will be producedwhenever possible.Tail calls can only be optimized when this, the fastcc,the GHC or the HiPE convention is used.This calling convention does not support varargs and requires the prototype ofall callees to exactly match the prototype of the function definition.

swiftcc” - This calling convention is used for Swift language.
  • On X86-64 RCX and R8 are available for additional integer returns, andXMM2 and XMM3 are available for additional FP/vector returns.

  • On iOS platforms, we use AAPCS-VFP calling convention.

swifttailcc

This calling convention is likeswiftcc in most respects, but also thecallee pops the argument area of the stack so that mandatory tail calls arepossible as intailcc.

cfguard_checkcc” - Windows Control Flow Guard (Check mechanism)

This calling convention is used for the Control Flow Guard check function,calls to which can be inserted before indirect calls to check that the calltarget is a valid function address. The check function has no return value,but it will trigger an OS-level error if the address is not a valid target.The set of registers preserved by the check function, and the registercontaining the target address are architecture-specific.

  • On X86 the target address is passed in ECX.

  • On ARM the target address is passed in R0.

  • On AArch64 the target address is passed in X15.

cc<n>” - Numbered convention

Any calling convention may be specified by number, allowingtarget-specific calling conventions to be used. Target specificcalling conventions start at 64.

More calling conventions can be added/defined on an as-needed basis, tosupport Pascal conventions or any other well-known target-independentconvention.

Visibility Styles

All Global Variables and Functions have one of the following visibilitystyles:

default” - Default style

On targets that use the ELF object file format, default visibilitymeans that the declaration is visible to other modules and, inshared libraries, means that the declared entity may be overridden.On Darwin, default visibility means that the declaration is visibleto other modules. On XCOFF, default visibility means no explicitvisibility bit will be set and whether the symbol is visible(i.e “exported”) to other modules depends primarily on export listsprovided to the linker. Default visibility corresponds to “externallinkage” in the language.

hidden” - Hidden style

Two declarations of an object with hidden visibility refer to thesame object if they are in the same shared object. Usually, hiddenvisibility indicates that the symbol will not be placed into thedynamic symbol table, so no other module (executable or sharedlibrary) can reference it directly.

protected” - Protected style

On ELF, protected visibility indicates that the symbol will beplaced in the dynamic symbol table, but that references within thedefining module will bind to the local symbol. That is, the symbolcannot be overridden by another module.

A symbol withinternal orprivate linkage must havedefaultvisibility.

DLL Storage Classes

All Global Variables, Functions and Aliases can have one of the followingDLL storage class:

dllimport

dllimport” causes the compiler to reference a function or variable viaa global pointer to a pointer that is set up by the DLL exporting thesymbol. On Microsoft Windows targets, the pointer name is formed bycombining__imp_ and the function or variable name.

dllexport

On Microsoft Windows targets, “dllexport” causes the compiler to providea global pointer to a pointer in a DLL, so that it can be referenced with thedllimport attribute. the pointer name is formed by combining__imp_and the function or variable name. On XCOFF targets,dllexport indicatesthat the symbol will be made visible to other modules using “exported”visibility and thus placed by the linker in the loader section symbol table.Since this storage class exists for defining a dll interface, the compiler,assembler and linker know it is externally referenced and must refrain fromdeleting the symbol.

A symbol withinternal orprivate linkage cannot have a DLL storageclass.

Thread Local Storage Models

A variable may be defined asthread_local, which means that it willnot be shared by threads (each thread will have a separated copy of thevariable). Not all targets support thread-local variables. Optionally, aTLS model may be specified:

localdynamic

For variables that are only used within the current shared library.

initialexec

For variables in modules that will not be loaded dynamically.

localexec

For variables defined in the executable and only used within it.

If no explicit model is given, the “general dynamic” model is used.

The models correspond to the ELF TLS models; seeELF Handling ForThread-Local Storage formore information on under which circumstances the different models maybe used. The target may choose a different TLS model if the specifiedmodel is not supported, or if a better choice of model can be made.

A model can also be specified in an alias, but then it only governs howthe alias is accessed. It will not have any effect in the aliasee.

For platforms without linker support of ELF TLS model, the -femulated-tlsflag can be used to generate GCC compatible emulated TLS code.

Runtime Preemption Specifiers

Global variables, functions and aliases may have an optional runtime preemptionspecifier. If a preemption specifier isn’t given explicitly, then asymbol is assumed to bedso_preemptable.

dso_preemptable

Indicates that the function or variable may be replaced by a symbol fromoutside the linkage unit at runtime.

dso_local

The compiler may assume that a function or variable marked asdso_localwill resolve to a symbol within the same linkage unit. Direct access willbe generated even if the definition is not within this compilation unit.

Structure Types

LLVM IR allows you to specify both “identified” and “literal”structuretypes. Literal types are uniqued structurally, but identified typesare never uniqued. Anopaque structural type can also be usedto forward declare a type that is not yet available.

An example of an identified structure specification is:

%mytype=type{%mytype*,i32}

Prior to the LLVM 3.0 release, identified types were structurally uniqued. Onlyliteral types are uniqued in recent versions of LLVM.

Non-Integral Pointer Type

Note: non-integral pointer types are a work in progress, and they should beconsidered experimental at this time.

LLVM IR optionally allows the frontend to denote pointers in certain addressspaces as “non-integral” via thedatalayout string.Non-integral pointer types represent pointers that have anunspecified bitwiserepresentation; that is, the integral representation may be target dependent orunstable (not backed by a fixed integer).

inttoptr andptrtoint instructions have the same semantics as forintegral (i.e. normal) pointers in that they convert integers to and fromcorresponding pointer types, but there are additional implications to beaware of. Because the bit-representation of a non-integral pointer maynot be stable, two identical casts of the same operand may or may notreturn the same value. Said differently, the conversion to or from thenon-integral type depends on environmental state in an implementationdefined manner.

If the frontend wishes to observe aparticular value following a cast, thegenerated IR must fence with the underlying environment in an implementationdefined manner. (In practice, this tends to requirenoinline routines forsuch operations.)

From the perspective of the optimizer,inttoptr andptrtoint fornon-integral types are analogous to ones on integral types with onekey exception: the optimizer may not, in general, insert new dynamicoccurrences of such casts. If a new cast is inserted, the optimizer wouldneed to either ensure that a) all possible values are valid, or b)appropriate fencing is inserted. Since the appropriate fencing isimplementation defined, the optimizer can’t do the latter. The former ischallenging as many commonly expected properties, such asptrtoint(v)-ptrtoint(v)==0, don’t hold for non-integral types.Similar restrictions apply to intrinsics that might examine the pointer bits,such asllvm.ptrmask.

The alignment information provided by the frontend for a non-integral pointer(typically using attributes or metadata) must be valid for every possiblerepresentation of the pointer.

Global Variables

Global variables define regions of memory allocated at compilation timeinstead of run-time.

Global variable definitions must be initialized with a sized value.

Global variables in other translation units can also be declared, in whichcase they don’t have an initializer.

Global variables can optionally specify alinkage type.

Either global variable definitions or declarations may have an explicit sectionto be placed in and may have an optional explicit alignment specified. If thereis a mismatch between the explicit or inferred section information for thevariable declaration and its definition the resulting behavior is undefined.

A variable may be defined as a globalconstant, which indicates thatthe contents of the variable willnever be modified (enabling betteroptimization, allowing the global data to be placed in the read-onlysection of an executable, etc). Note that variables that need runtimeinitialization cannot be markedconstant as there is a store to thevariable.

LLVM explicitly allowsdeclarations of global variables to be markedconstant, even if the final definition of the global is not. Thiscapability can be used to enable slightly better optimization of theprogram, but requires the language definition to guarantee thatoptimizations based on the ‘constantness’ are valid for the translationunits that do not include the definition.

As SSA values, global variables define pointer values that are in scopefor (i.e. they dominate) all basic blocks in the program. Global variablesalways define a pointer to their “content” type because they describe aregion of memory, and allallocated object in LLVM areaccessed through pointers.

Global variables can be marked withunnamed_addr which indicatesthat the address is not significant, only the content. Constants markedlike this can be merged with other constants if they have the sameinitializer. Note that a constant with significant addresscan bemerged with aunnamed_addr constant, the result being a constantwhose address is significant.

If thelocal_unnamed_addr attribute is given, the address is known tonot be significant within the module.

A global variable may be declared to reside in a target-specificnumbered address space. For targets that support them, address spacesmay affect how optimizations are performed and/or what targetinstructions are used to access the variable. The default address spaceis zero. The address space qualifier must precede any other attributes.

LLVM allows an explicit section to be specified for globals. If thetarget supports it, it will emit globals to the section specified.Additionally, the global can placed in a comdat if the target has the necessarysupport.

External declarations may have an explicit section specified. Sectioninformation is retained in LLVM IR for targets that make use of thisinformation. Attaching section information to an external declaration is anassertion that its definition is located in the specified section. If thedefinition is located in a different section, the behavior is undefined.

LLVM allows an explicit code model to be specified for globals. If thetarget supports it, it will emit globals in the code model specified,overriding the code model used to compile the translation unit.The allowed values are “tiny”, “small”, “kernel”, “medium”, “large”.This may be extended in the future to specify global data layout thatdoesn’t cleanly fit into a specific code model.

By default, global initializers are optimized by assuming that globalvariables defined within the module are not modified from theirinitial values before the start of the global initializer. This istrue even for variables potentially accessible from outside themodule, including those with external linkage or appearing in@llvm.used or dllexported variables. This assumption may be suppressedby marking the variable withexternally_initialized.

An explicit alignment may be specified for a global, which must be apower of 2. If not present, or if the alignment is set to zero, thealignment of the global is set by the target to whatever it feelsconvenient. If an explicit alignment is specified, the global is forcedto have exactly that alignment. Targets and optimizers are not allowedto over-align the global if the global has an assigned section. In thiscase, the extra alignment could be observable: for example, code couldassume that the globals are densely packed in their section and try toiterate over them as an array, alignment padding would break thisiteration. For TLS variables, the module flagMaxTLSAlign, if present,limits the alignment to the given value. Optimizers are not allowed toimpose a stronger alignment on these variables. The maximum alignmentis1<<32.

For global variable declarations, as well as definitions that may bereplaced at link time (linkonce,weak,extern_weak andcommonlinkage types), the allocation size and alignment of the definition it resolvesto must be greater than or equal to that of the declaration or replaceabledefinition, otherwise the behavior is undefined.

Globals can also have aDLL storage class,an optionalruntime preemption specifier,an optionalglobal attributes andan optional list of attachedmetadata.

Variables and aliases can have aThread Local Storage Model.

Globals cannot be or containScalable vectors because theirsize is unknown at compile time. They are allowed in structs to facilitateintrinsics returning multiple values. Generally, structs containing scalablevectors are not considered “sized” and cannot be used in loads, stores, allocas,or GEPs. The only exception to this rule is for structs that contain scalablevectors of the same type (e.g.{<vscalex2xi32>,<vscalex2xi32>}contains the same type while{<vscalex2xi32>,<vscalex2xi64>}doesn’t). These kinds of structs (we may call them homogeneous scalable vectorstructs) are considered sized and can be used in loads, stores, allocas, butnot GEPs.

Globals withtoc-data attribute set are stored in TOC of XCOFF. Theiralignments are not larger than that of a TOC entry. Optimizations should notincrease their alignments to mitigate TOC overflow.

Syntax:

@<GlobalVarName> = [Linkage] [PreemptionSpecifier] [Visibility]                   [DLLStorageClass] [ThreadLocal]                   [(unnamed_addr|local_unnamed_addr)] [AddrSpace]                   [ExternallyInitialized]                   <global | constant> <Type> [<InitializerConstant>]                   [, section "name"] [, partition "name"]                   [, comdat [($name)]] [, align <Alignment>]                   [, code_model "model"]                   [, no_sanitize_address] [, no_sanitize_hwaddress]                   [, sanitize_address_dyninit] [, sanitize_memtag]                   (, !name !N)*

For example, the following defines a global in a numbered address spacewith an initializer, section, and alignment:

@G=addrspace(5)constantfloat1.0,section"foo",align4

The following example just declares a global variable

@G=externalglobali32

The following example defines a global variable with thelarge code model:

@G=internalglobali320,code_model"large"

The following example defines a thread-local global with theinitialexec TLS model:

@G=thread_local(initialexec)globali320,align4

Functions

LLVM function definitions consist of the “define” keyword, anoptionallinkage type, an optionalruntime preemptionspecifier, an optionalvisibilitystyle, an optionalDLL storage class,an optionalcalling convention,an optionalunnamed_addr attribute, a return type, an optionalparameter attribute for the return type, a functionname, a (possibly empty) argument list (each with optionalparameterattributes), optionalfunction attributes,an optional address space, an optional section, an optional partition,an optional alignment, an optionalcomdat,an optionalgarbage collector name, an optionalprefix,an optionalprologue,an optionalpersonality,an optional list of attachedmetadata,an opening curly brace, a list of basic blocks, and a closing curly brace.

Syntax:

define [linkage] [PreemptionSpecifier] [visibility] [DLLStorageClass]       [cconv] [ret attrs]       <ResultType> @<FunctionName> ([argument list])       [(unnamed_addr|local_unnamed_addr)] [AddrSpace] [fn Attrs]       [section "name"] [partition "name"] [comdat [($name)]] [align N]       [gc] [prefix Constant] [prologue Constant] [personality Constant]       (!name !N)* { ... }

The argument list is a comma separated sequence of arguments where eachargument is of the following form:

Syntax:

<type>[parameterAttrs][name]

LLVM function declarations consist of the “declare” keyword, anoptionallinkage type, an optionalvisibility style, an optionalDLL storage class, anoptionalcalling convention, an optionalunnamed_addrorlocal_unnamed_addr attribute, an optional address space, a return type,an optionalparameter attribute for the return type, a function name, a possiblyempty list of arguments, an optional alignment, an optionalgarbagecollector name, an optionalprefix, and an optionalprologue.

Syntax:

declare[linkage][visibility][DLLStorageClass][cconv][retattrs]<ResultType>@<FunctionName>([argumentlist])[(unnamed_addr|local_unnamed_addr)][alignN][gc][prefixConstant][prologueConstant]

A function definition contains a list of basic blocks, forming the CFG (ControlFlow Graph) for the function. Each basic block may optionally start with a label(giving the basic block a symbol table entry), contains a list of instructionsanddebug records,and ends with aterminator instruction (such as a branch orfunction return). If an explicit label name is not provided, a block is assignedan implicit numbered label, using the next value from the same counter as usedfor unnamed temporaries (see above). For example, if afunction entry block does not have an explicit label, it will be assigned label“%0”, then the first unnamed temporary in that block will be “%1”, etc. If anumeric label is explicitly specified, it must match the numeric label thatwould be used implicitly.

The first basic block in a function is special in two ways: it isimmediately executed on entrance to the function, and it is not allowedto have predecessor basic blocks (i.e. there can not be any branches tothe entry block of a function). Because the block can have nopredecessors, it also cannot have anyPHI nodes.

LLVM allows an explicit section to be specified for functions. If thetarget supports it, it will emit functions to the section specified.Additionally, the function can be placed in a COMDAT.

An explicit alignment may be specified for a function. If not present,or if the alignment is set to zero, the alignment of the function is setby the target to whatever it feels convenient. If an explicit alignmentis specified, the function is forced to have at least that muchalignment. All alignments must be a power of 2.

If theunnamed_addr attribute is given, the address is known to notbe significant and two identical functions can be merged.

If thelocal_unnamed_addr attribute is given, the address is known tonot be significant within the module.

If an explicit address space is not given, it will default to the programaddress space from thedatalayout string.

Aliases

Aliases, unlike function or variables, don’t create any new data. Theyare just a new symbol and metadata for an existing position.

Aliases have a name and an aliasee that is either a global value or aconstant expression.

Aliases may have an optionallinkage type, an optionalruntime preemption specifier, an optionalvisibility style, an optionalDLL storage class and an optionaltls model.

Syntax:

@<Name>=[Linkage][PreemptionSpecifier][Visibility][DLLStorageClass][ThreadLocal][(unnamed_addr|local_unnamed_addr)]alias<AliaseeTy>,<AliaseeTy>*@<Aliasee>[,partition"name"]

The linkage must be one ofprivate,internal,linkonce,weak,linkonce_odr,weak_odr,external,available_externally. Notethat some system linkers might not correctly handle dropping a weak symbol thatis aliased.

Aliases that are notunnamed_addr are guaranteed to have the same address asthe aliasee expression.unnamed_addr ones are only guaranteed to pointto the same content.

If thelocal_unnamed_addr attribute is given, the address is known tonot be significant within the module.

Since aliases are only a second name, some restrictions apply, of whichsome can only be checked when producing an object file:

  • The expression defining the aliasee must be computable at assemblytime. Since it is just a name, no relocations can be used.

  • No alias in the expression can be weak as the possibility of theintermediate alias being overridden cannot be represented in anobject file.

  • If the alias has theavailable_externally linkage, the aliasee must be anavailable_externally global value; otherwise the aliasee can be anexpression but no global value in the expression can be a declaration, sincethat would require a relocation, which is not possible.

  • If either the alias or the aliasee may be replaced by a symbol outside themodule at link time or runtime, any optimization cannot replace the alias withthe aliasee, since the behavior may be different. The alias may be used as aname guaranteed to point to the content in the current module.

IFuncs

IFuncs, like as aliases, don’t create any new data or func. They are just a newsymbol that is resolved at runtime by calling a resolver function.

On ELF platforms, IFuncs are resolved by the dynamic linker at load time. OnMach-O platforms, they are lowered in terms of.symbol_resolver functions,which lazily resolve the callee the first time they are called.

IFunc may have an optionallinkage type and an optionalvisibility style.

Syntax:

@<Name>=[Linkage][PreemptionSpecifier][Visibility]ifunc<IFuncTy>,<ResolverTy>*@<Resolver>[,partition"name"]

Comdats

Comdat IR provides access to object file COMDAT/section group functionalitywhich represents interrelated sections.

Comdats have a name which represents the COMDAT key and a selection kind toprovide input on how the linker deduplicates comdats with the same key in twodifferent object files. A comdat must be included or omitted as a unit.Discarding the whole comdat is allowed but discarding a subset is not.

A global object may be a member of at most one comdat. Aliases are placed in thesame COMDAT that their aliasee computes to, if any.

Syntax:

$<Name> = comdat SelectionKind

For selection kinds other thannodeduplicate, only one of the duplicatecomdats may be retained by the linker and the members of the remaining comdatsmust be discarded. The following selection kinds are supported:

any

The linker may choose any COMDAT key, the choice is arbitrary.

exactmatch

The linker may choose any COMDAT key but the sections must contain thesame data.

largest

The linker will choose the section containing the largest COMDAT key.

nodeduplicate

No deduplication is performed.

samesize

The linker may choose any COMDAT key but the sections must contain thesame amount of data.

  • XCOFF and Mach-O don’t support COMDATs.

  • COFF supports all selection kinds. Non-nodeduplicate selection kinds needa non-local linkage COMDAT symbol.

  • ELF supportsany andnodeduplicate.

  • WebAssembly only supportsany.

Here is an example of a COFF COMDAT where a function will only be selected ifthe COMDAT key’s section is the largest:

$foo = comdat largest@foo = global i32 2, comdat($foo)define void @bar() comdat($foo) {  ret void}

In a COFF object file, this will create a COMDAT section with selection kindIMAGE_COMDAT_SELECT_LARGEST containing the contents of the@foo symboland another COMDAT section with selection kindIMAGE_COMDAT_SELECT_ASSOCIATIVE which is associated with the first COMDATsection and contains the contents of the@bar symbol.

As a syntactic sugar the$name can be omitted if the name is the same asthe global name:

$foo=comdatany@foo=globali322,comdat@bar=globali323,comdat($foo)

There are some restrictions on the properties of the global object.It, or an alias to it, must have the same name as the COMDAT group whentargeting COFF.The contents and size of this object may be used during link-time to determinewhich COMDAT groups get selected depending on the selection kind.Because the name of the object must match the name of the COMDAT group, thelinkage of the global object must not be local; local symbols can get renamedif a collision occurs in the symbol table.

The combined use of COMDATS and section attributes may yield surprising results.For example:

$foo=comdatany$bar=comdatany@g1=globali3242,section"sec",comdat($foo)@g2=globali3242,section"sec",comdat($bar)

From the object file perspective, this requires the creation of two sectionswith the same name. This is necessary because both globals belong to differentCOMDAT groups and COMDATs, at the object file level, are represented bysections.

Note that certain IR constructs like global variables and functions maycreate COMDATs in the object file in addition to any which are specified usingCOMDAT IR. This arises when the code generator is configured to emit globalsin individual sections (e.g. when-data-sections or-function-sectionsis supplied tollc).

Named Metadata

Named metadata is a collection of metadata.Metadatanodes (but not metadata strings) are the only validoperands for a named metadata.

  1. Named metadata are represented as a string of characters with themetadata prefix. The rules for metadata names are the same as foridentifiers, but quoted names are not allowed."\xx" type escapesare still valid, which allows any character to be part of a name.

Syntax:

; Some unnamed metadata nodes, which are referenced by the named metadata.!0 = !{!"zero"}!1 = !{!"one"}!2 = !{!"two"}; A named metadata.!name = !{!0, !1, !2}

Parameter Attributes

The return type and each parameter of a function type may have a set ofparameter attributes associated with them. Parameter attributes areused to communicate additional information about the result orparameters of a function. Parameter attributes are considered to be partof the function, not of the function type, so functions with differentparameter attributes can have the same function type. Parameter attributes canbe placed both on function declarations/definitions, and at call-sites.

Parameter attributes are either simple keywords or strings that follow thespecified type. Multiple parameter attributes, when required, are separated byspaces. For example:

; On function declarations/definitions:declarei32@printf(ptrnoaliascaptures(none),...)declarei32@atoi(i8zeroext)declaresignexti8@returns_signed_char()definevoid@baz(i32"amdgpu-flat-work-group-size"="1,256"%x); On call-sites:calli32@atoi(i8zeroext%x)callsignexti8@returns_signed_char()

Note that any attributes for the function result (nonnull,signext) come before the result type.

Parameter attributes can be broadly separated into two kinds: ABI attributesthat affect how values are passed to/from functions, likezeroext,inreg,byval, orsret. And optimization attributes, which provideadditional optimization guarantees, likenoalias,nonnull anddereferenceable.

ABI attributes must be specifiedboth at the function declaration/definitionand call-site, otherwise the behavior may be undefined. ABI attributes cannotbe safely dropped. Optimization attributes do not have to match betweencall-site and function: The intersection of their implied semantics applies.Optimization attributes can also be freely dropped.

If an integer argument to a function is not marked signext/zeroext/noext, thekind of extension used is target-specific. Some targets depend forcorrectness on the kind of extension to be explicitly specified.

Currently, only the following parameter attributes are defined:

zeroext

This indicates to the code generator that the parameter or returnvalue should be zero-extended to the extent required by the target’sABI by the caller (for a parameter) or the callee (for a return value).

signext

This indicates to the code generator that the parameter or returnvalue should be sign-extended to the extent required by the target’sABI (which is usually 32-bits) by the caller (for a parameter) orthe callee (for a return value).

noext

This indicates to the code generator that the parameter or returnvalue has the high bits undefined, as for a struct in register, andtherefore does not need to be sign or zero extended. This is the sameas default behavior and is only actually used (by some targets) tovalidate that one of the attributes is always present.

inreg

This indicates that this parameter or return value should be treatedin a special target-dependent fashion while emitting code fora function call or return (usually, by putting it in a register asopposed to memory, though some targets use it to distinguish betweentwo different kinds of registers). Use of this attribute istarget-specific.

byval(<ty>)

This indicates that the pointer parameter should really be passed byvalue to the function. The attribute implies that a hidden copy ofthe pointee is made between the caller and the callee, so the calleeis unable to modify the value in the caller. This attribute is onlyvalid on LLVM pointer arguments. It is generally used to passstructs and arrays by value, but is also valid on pointers toscalars. The copy is considered to belong to the caller not thecallee (for example,readonly functions should not write tobyval parameters). This is not a valid attribute for returnvalues.

The byval type argument indicates the in-memory value type.

The byval attribute also supports specifying an alignment with thealign attribute. It indicates the alignment of the stack slot toform and the known alignment of the pointer specified to the callsite. If the alignment is not specified, then the code generatormakes a target-specific assumption.

byref(<ty>)

Thebyref argument attribute allows specifying the pointeememory type of an argument. This is similar tobyval, but doesnot imply a copy is made anywhere, or that the argument is passedon the stack. This implies the pointer is dereferenceable up tothe storage size of the type.

It is not generally permissible to introduce a write to anbyref pointer. The pointer may have any address space and maybe read only.

This is not a valid attribute for return values.

The alignment for anbyref parameter can be explicitlyspecified by combining it with thealign attribute, similar tobyval. If the alignment is not specified, then the code generatormakes a target-specific assumption.

This is intended for representing ABI constraints, and is notintended to be inferred for optimization use.

preallocated(<ty>)

This indicates that the pointer parameter should really be passed byvalue to the function, and that the pointer parameter’s pointee hasalready been initialized before the call instruction. This attributeis only valid on LLVM pointer arguments. The argument must be the valuereturned by the appropriatellvm.call.preallocated.arg on nonmusttail calls, or the corresponding caller parameter inmusttailcalls, although it is ignored during codegen.

A nonmusttail function call with apreallocated attribute inany parameter must have a"preallocated" operand bundle. Amusttailfunction call cannot have a"preallocated" operand bundle.

The preallocated attribute requires a type argument.

The preallocated attribute also supports specifying an alignment with thealign attribute. It indicates the alignment of the stack slot toform and the known alignment of the pointer specified to the callsite. If the alignment is not specified, then the code generatormakes a target-specific assumption.

inalloca(<ty>)

Theinalloca argument attribute allows the caller to take theaddress of outgoing stack arguments. Aninalloca argument mustbe a pointer to stack memory produced by analloca instruction.The alloca, or argument allocation, must also be tagged with theinalloca keyword. Only the last argument may have theinallocaattribute, and that argument is guaranteed to be passed in memory.

An argument allocation may be used by a call at most once becausethe call may deallocate it. Theinalloca attribute cannot beused in conjunction with other attributes that affect argumentstorage, likeinreg,nest,sret, orbyval. Theinalloca attribute also disables LLVM’s implicit lowering oflarge aggregate return values, which means that frontend authorsmust lower them withsret pointers.

When the call site is reached, the argument allocation must havebeen the most recent stack allocation that is still live, or thebehavior is undefined. It is possible to allocate additional stackspace after an argument allocation and before its call site, but itmust be cleared off withllvm.stackrestore.

The inalloca attribute requires a type argument.

SeeDesign and Usage of the InAlloca Attribute for more information on how to use thisattribute.

sret(<ty>)

This indicates that the pointer parameter specifies the address of astructure that is the return value of the function in the sourceprogram. This pointer must be guaranteed by the caller to be valid:loads and stores to the structure may be assumed by the callee notto trap and to be properly aligned.

The sret type argument specifies the in memory type.

A function that accepts ansret argument must returnvoid.A return value may not besret.

elementtype(<ty>)

Theelementtype argument attribute can be used to specify a pointerelement type in a way that is compatible withopaque pointers.

Theelementtype attribute by itself does not carry any specificsemantics. However, certain intrinsics may require this attribute to bepresent and assign it particular semantics. This will be documented onindividual intrinsics.

The attribute may only be applied to pointer typed arguments of intrinsiccalls. It cannot be applied to non-intrinsic calls, and cannot be appliedto parameters on function declarations. For non-opaque pointers, the typepassed toelementtype must match the pointer element type.

align<n> oralign(<n>)

This indicates that the pointer value or vector of pointers has thespecified alignment. If applied to a vector of pointers,all pointers(elements) have the specified alignment. If the pointer value does not havethe specified alignment,poison value is returned orpassed instead. Thealign attribute should be combined with thenoundef attribute to ensure a pointer is aligned, or otherwise thebehavior is undefined. Note thatalign1 has no effect on non-byval,non-preallocated arguments.

Note that this attribute has additional semantics when combined with thebyval orpreallocated attribute, which are documented there.

noalias

This indicates that memory locations accessed via pointer valuesbased on the argument or return value are not alsoaccessed, during the execution of the function, via pointer values notbased on the argument or return value. This guarantee only holds formemory locations that aremodified, by any means, during the execution ofthe function. If there are other accesses not based on the argument orreturn value, the behavior is undefined. The attribute on a return valuealso has additional semantics, as described below. Both the caller and thecallee share the responsibility of ensuring that these requirements aremet. For further details, please see the discussion of the NoAlias responseinalias analysis.

Note that this definition ofnoalias is intentionally similarto the definition ofrestrict in C99 for function arguments.

For function return values, C99’srestrict is not meaningful,while LLVM’snoalias is. Furthermore, the semantics of thenoaliasattribute on return values are stronger than the semantics of the attributewhen used on function arguments. On function return values, thenoaliasattribute indicates that the function acts like a system memory allocationfunction, returning a pointer to allocated storage disjoint from thestorage for any other object accessible to the caller.

captures(...)

This attributes restrict the ways in which the callee may capture thepointer. This is not a valid attribute for return values. This attributeapplies only to the particular copy of the pointer passed in this argument.

The arguments ofcaptures is a list of captured pointer components,which may benone, or a combination of:

  • address: The integral address of the pointer.

  • address_is_null (subset ofaddress): Whether the address is null.

  • provenance: The ability to access the pointer for both read and writeafter the function returns.

  • read_provenance (subset ofprovenance): The ability to access thepointer only for reads after the function returns.

Additionally, it is possible to specify that some components are onlycaptured in certain locations. Currently only the return value (ret)and other (default) locations are supported.

Thepointer capture section discusses these semanticsin more detail.

Some examples of how to use the attribute:

  • captures(none): Pointer not captured.

  • captures(address,provenance): Equivalent to omitting the attribute.

  • captures(address): Address may be captured, but not provenance.

  • captures(address_is_null): Only captures whether the address is null.

  • captures(address,read_provenance): Both address and provenancecaptured, but only for read-only access.

  • captures(ret:address,provenance): Pointer captured through returnvalue only.

  • captures(address_is_null,ret:address,provenance): The whole pointeris captured through the return value, and additionally whether the pointeris null is captured in some other way.

nofree

This indicates that callee does not free the pointer argument. This is nota valid attribute for return values.

nest

This indicates that the pointer parameter can be excised using thetrampoline intrinsics. This is not a validattribute for return values and can only be applied to one parameter.

returned

This indicates that the function always returns the argument as its returnvalue. This is a hint to the optimizer and code generator used whengenerating the caller, allowing value propagation, tail call optimization,and omission of register saves and restores in some cases; it is notchecked or enforced when generating the callee. The parameter and thefunction return type must be valid operands for thebitcast instruction. This is not a valid attribute forreturn values and can only be applied to one parameter.

nonnull

This indicates that the parameter or return pointer is not null. Thisattribute may only be applied to pointer typed parameters. This is notchecked or enforced by LLVM; if the parameter or return pointer is null,poison value is returned or passed instead.Thenonnull attribute should be combined with thenoundef attributeto ensure a pointer is not null or otherwise the behavior is undefined.

dereferenceable(<n>)

This indicates that the parameter or return pointer is dereferenceable. Thisattribute may only be applied to pointer typed parameters. A pointer thatis dereferenceable can be loaded from speculatively without a risk oftrapping. The number of bytes known to be dereferenceable must be providedin parentheses. It is legal for the number of bytes to be less than thesize of the pointee type. Thenonnull attribute does not implydereferenceability (consider a pointer to one element past the end of anarray), howeverdereferenceable(<n>) does implynonnull inaddrspace(0) (which is the default address space), except if thenull_pointer_is_valid function attribute is present.n should be a positive number. The pointer should be well defined,otherwise it is undefined behavior. This meansdereferenceable(<n>)impliesnoundef. When used in an assume operand bundle, more restrictedsemantics apply. Seeassume operand bundles formore details.

dereferenceable_or_null(<n>)

This indicates that the parameter or return value isn’t bothnon-null and non-dereferenceable (up to<n> bytes) at the sametime. All non-null pointers tagged withdereferenceable_or_null(<n>) aredereferenceable(<n>).For address space 0dereferenceable_or_null(<n>) implies thata pointer is exactly one ofdereferenceable(<n>) ornull,and in other address spacesdereferenceable_or_null(<n>)implies that a pointer is at least one ofdereferenceable(<n>)ornull (i.e. it may be bothnull anddereferenceable(<n>)). This attribute may only be applied topointer typed parameters.

swiftself

This indicates that the parameter is the self/context parameter. This is nota valid attribute for return values and can only be applied to oneparameter.

swiftasync

This indicates that the parameter is the asynchronous context parameter andtriggers the creation of a target-specific extended frame record to storethis pointer. This is not a valid attribute for return values and can onlybe applied to one parameter.

swifterror

This attribute is motivated to model and optimize Swift error handling. Itcan be applied to a parameter with pointer to pointer type or apointer-sized alloca. At the call site, the actual argument that correspondsto aswifterror parameter has to come from aswifterror alloca ortheswifterror parameter of the caller. Aswifterror value (eitherthe parameter or the alloca) can only be loaded and stored from, or used asaswifterror argument. This is not a valid attribute for return valuesand can only be applied to one parameter.

These constraints allow the calling convention to optimize access toswifterror variables by associating them with a specific register atcall boundaries rather than placing them in memory. Since this does changethe calling convention, a function which uses theswifterror attributeon a parameter is not ABI-compatible with one which does not.

These constraints also allow LLVM to assume that aswifterror argumentdoes not alias any other memory visible within a function and that aswifterror alloca passed as an argument does not escape.

immarg

This indicates the parameter is required to be an immediatevalue. This must be a trivial immediate integer or floating-pointconstant. Undef or constant expressions are not valid. This isonly valid on intrinsic declarations and cannot be applied to acall site or arbitrary function.

noundef

This attribute applies to parameters and return values. If the valuerepresentation contains any undefined or poison bits, the behavior isundefined. Note that this does not refer to padding introduced by thetype’s storage representation.

If memory sanitizer is enabled,noundef becomes an ABI attribute andmust match between the call-site and the function definition.

nofpclass(<testmask>)

This attribute applies to parameters and return values withfloating-point and vector of floating-point types, as well assupported aggregates of such types(matching the supported types forfast-math flags).The test mask has the same format as the second argument to thellvm.is.fpclass, and indicates which classesof floating-point values are not permitted for the value. For examplea bitmask of 3 indicates the parameter may not be a NaN.

If the value is a floating-point class indicated by thenofpclass test mask, apoison value ispassed or returned instead.

Listing 20The following invariants hold
     @llvm.is.fpclass(nofpclass(test_mask) %x, test_mask) => false     @llvm.is.fpclass(nofpclass(test_mask) %x, ~test_mask) => true     nofpclass(all) => poison

In textual IR, various string names are supported for readabilityand can be combined. For examplenofpclass(nanpinfnzero)evaluates to a mask of 547.

This does not depend on the floating-point environment. Forexample, a function parameter markednofpclass(zero) indicatesno zero inputs. If this is applied to an argument in a functionmarked with“denormal-fp-math”indicating zero treatment of input denormals, it does not imply thevalue cannot be a denormal value which would compare equal to 0.

Table 1Recognized test mask names

Name

floating-point class

Bitmask value

nan

Any NaN

3

inf

+/- infinity

516

norm

+/- normal

264

sub

+/- subnormal

144

zero

+/- 0

96

all

All values

1023

snan

Signaling NaN

1

qnan

Quiet NaN

2

ninf

Negative infinity

4

nnorm

Negative normal

8

nsub

Negative subnormal

16

nzero

Negative zero

32

pzero

Positive zero

64

psub

Positive subnormal

128

pnorm

Positive normal

256

pinf

Positive infinity

512

alignstack(<n>)

This indicates the alignment that should be considered by the backend whenassigning this parameter or return value to a stack slot during callingconvention lowering. The enforcement of the specified alignment istarget-dependent, as target-specific calling convention rules may overridethis value. This attribute serves the purpose of carrying language specificalignment information that is not mapped to base types in the backend (forexample, over-alignment specification through language attributes).

allocalign

The function parameter marked with this attribute is the alignment in bytes of thenewly allocated block returned by this function. The returned value must either havethe specified alignment or be the null pointer. The return value MAY be more alignedthan the requested alignment, but not less aligned. Invalid (e.g. non-power-of-2)alignments are permitted for the allocalign parameter, so long as the returned pointeris null. This attribute may only be applied to integer parameters.

allocptr

The function parameter marked with this attribute is the pointerthat will be manipulated by the allocator. For a realloc-likefunction the pointer will be invalidated upon success (but thesame address may be returned), for a free-like function thepointer will always be invalidated.

readnone

This attribute indicates that the function does not dereference thatpointer argument, even though it may read or write the memory that thepointer points to if accessed through other pointers.

If a function reads from or writes to a readnone pointer argument, thebehavior is undefined.

readonly

This attribute indicates that the function does not write through thispointer argument, even though it may write to the memory that the pointerpoints to.

If a function writes to a readonly pointer argument, the behavior isundefined.

writeonly

This attribute indicates that the function may write to, but does not readthrough this pointer argument (even though it may read from the memory thatthe pointer points to).

This attribute is understood in the same way as thememory(write)attribute. That is, the pointer may still be read as long as the read isnot observable outside the function. See thememory documentation forprecise semantics.

writable

This attribute is only meaningful in conjunction withdereferenceable(N)or another attribute that implies the firstN bytes of the pointerargument are dereferenceable.

In that case, the attribute indicates that the firstN bytes will be(non-atomically) loaded and stored back on entry to the function.

This implies that it’s possible to introduce spurious stores on entry tothe function without introducing traps or data races. This does notnecessarily hold throughout the whole function, as the pointer may escapeto a different thread during the execution of the function. See also theatomic optimization guide

The “other attributes” that imply dereferenceability aredereferenceable_or_null (if the pointer is non-null) and thesret,byval,byref,inalloca,preallocated family ofattributes. Note that not all of these combinations are useful, e.g.byval arguments are known to be writable even without this attribute.

Thewritable attribute cannot be combined withreadnone,readonly or amemory attribute that does not containargmem:write.

initializes((Lo1,Hi1),...)

This attribute indicates that the function initializes the ranges of thepointer parameter’s memory[%p+LoN,%p+HiN). Colloquially, this meansthat all bytes in the specified range are written before the functionreturns, and not read prior to the initializing write. If the functionunwinds, the write may not happen.

Formally, this is specified in terms of an “initialized” shadow state forall bytes in the range, which is set to “not initialized” at function entry.If a memory access is performed through a pointer based on the argument,and an accessed byte has not been marked as “initialized” yet, then:

  • If the byte is stored with a non-volatile, non-atomic write, mark it as“initialized”.

  • If the byte is stored with a volatile or atomic write, the behavior isundefined.

  • If the byte is loaded, return a poison value.

Additionally, if the function returns normally, write an undef value to allbytes that are part of the range and have not been marked as “initialized”.

This attribute only holds for the memory accessed via this pointerparameter. Other arbitrary accesses to the same memory via other pointersare allowed.

Thewritable ordereferenceable attribute do not imply theinitializes attribute. Theinitializes attribute does not implywriteonly sinceinitializes allows reading from the pointerafter writing.

This attribute is a list of constant ranges in ascending order with nooverlapping or consecutive list elements.LoN/HiN are 64-bit integers,and negative values are allowed in case the argument points partway intoan allocation. An empty list is not allowed.

On abyval argument,initializes refers to the given parts of thecallee copy being overwritten. Abyval callee can never initialize theoriginal caller memory passed to thebyval argument.

dead_on_unwind

At a high level, this attribute indicates that the pointer argument is deadif the call unwinds, in the sense that the caller will not depend on thecontents of the memory. Stores that would only be visible on the unwindpath can be elided.

More precisely, the behavior is as-if any memory written through thepointer during the execution of the function is overwritten with a poisonvalue on unwind. This includes memory written by the implicit write impliedby thewritable attribute. The caller is allowed to access the affectedmemory, but all loads that are not preceded by a store will return poison.

This attribute cannot be applied to return values.

dead_on_return

This attribute indicates that the memory pointed to by the argument is deadupon function return, both upon normal return and if the calls unwinds, meaningthat the caller will not depend on its contents. Stores that would be observableeither on the return path or on the unwind path may be elided.

Specifically, the behavior is as-if any memory written through the pointerduring the execution of the function is overwritten with a poison valueupon function return. The caller may access the memory, but any loadnot preceded by a store will return poison.

This attribute does not imply aliasing properties. For pointer arguments thatdo not alias other memory locations,noalias attribute may be used inconjunction. Conversely, this attribute always impliesdead_on_unwind.

This attribute cannot be applied to return values.

range(<ty><a>,<b>)

This attribute expresses the possible range of the parameter or return value.If the value is not in the specified range, it is converted to poison.The arguments passed torange have the following properties:

  • The type must match the scalar type of the parameter or return value.

  • The paira,b represents the range[a,b).

  • Botha andb are constants.

  • The range is allowed to wrap.

  • The empty range is represented using0,0.

  • Otherwise,a andb are not allowed to be equal.

This attribute may only be applied to parameters or return values with integeror vector of integer types.

For vector-typed parameters, the range is applied element-wise.

Garbage Collector Strategy Names

Each function may specify a garbage collector strategy name, which is simply astring:

definevoid@f()gc"name"{...}

The supported values ofname includes thosebuilt in to LLVM and any provided by loaded plugins. Specifying a GCstrategy will cause the compiler to alter its output in order to support thenamed garbage collection algorithm. Note that LLVM itself does not contain agarbage collector, this functionality is restricted to generating machine codewhich can interoperate with a collector provided externally.

Prefix Data

Prefix data is data associated with a function which the codegenerator will emit immediately before the function’s entrypoint.The purpose of this feature is to allow frontends to associatelanguage-specific runtime metadata with specific functions and make itavailable through the function pointer while still allowing thefunction pointer to be called.

To access the data for a given function, a program may bitcast thefunction pointer to a pointer to the constant’s type and dereferenceindex -1. This implies that the IR symbol points just past the end ofthe prefix data. For instance, take the example of a function annotatedwith a singlei32,

definevoid@f()prefixi32123{...}

The prefix data can be referenced as,

%a=getelementptrinboundsi32,ptr@f,i32-1%b=loadi32,ptr%a

Prefix data is laid out as if it were an initializer for a global variableof the prefix data’s type. The function will be placed such that thebeginning of the prefix data is aligned. This means that if the sizeof the prefix data is not a multiple of the alignment size, thefunction’s entrypoint will not be aligned. If alignment of thefunction’s entrypoint is desired, padding must be added to the prefixdata.

A function may have prefix data but no body. This has similar semanticsto theavailable_externally linkage in that the data may be used by theoptimizers but will not be emitted in the object file.

Prologue Data

Theprologue attribute allows arbitrary code (encoded as bytes) tobe inserted prior to the function body. This can be used for enablingfunction hot-patching and instrumentation.

To maintain the semantics of ordinary function calls, the prologue data musthave a particular format. Specifically, it must begin with a sequence ofbytes which decode to a sequence of machine instructions, valid for themodule’s target, which transfer control to the point immediately succeedingthe prologue data, without performing any other visible action. This allowsthe inliner and other passes to reason about the semantics of the functiondefinition without needing to reason about the prologue data. Obviously thismakes the format of the prologue data highly target dependent.

A trivial example of valid prologue data for the x86 architecture isi8144,which encodes thenop instruction:

define void @f() prologue i8 144 { ... }

Generally prologue data can be formed by encoding a relative branch instructionwhich skips the metadata, as in this example of valid prologue data for thex86_64 architecture, where the first two bytes encodejmp.+10:

%0 = type <{ i8, i8, ptr }>define void @f() prologue %0 <{ i8 235, i8 8, ptr @md}> { ... }

A function may have prologue data but no body. This has similar semanticsto theavailable_externally linkage in that the data may be used by theoptimizers but will not be emitted in the object file.

Personality Function

Thepersonality attribute permits functions to specify what functionto use for exception handling.

Attribute Groups

Attribute groups are groups of attributes that are referenced by objects withinthe IR. They are important for keeping.ll files readable, because a lot offunctions will use the same set of attributes. In the degenerative case of a.ll file that corresponds to a single.c file, the single attributegroup will capture the important command line flags used to build that file.

An attribute group is a module-level object. To use an attribute group, anobject references the attribute group’s ID (e.g.#37). An object may referto more than one attribute group. In that situation, the attributes from thedifferent groups are merged.

Here is an example of attribute groups for a function that should always beinlined, has a stack alignment of 4, and which shouldn’t use SSE instructions:

; Target-independent attributes:attributes#0={alwaysinlinealignstack=4}; Target-dependent attributes:attributes#1={"no-sse"}; Function @f has attributes: alwaysinline, alignstack=4, and "no-sse".definevoid@f()#0#1{...}

Function Attributes

Function attributes are set to communicate additional information abouta function. Function attributes are considered to be part of thefunction, not of the function type, so functions with different functionattributes can have the same function type.

Function attributes are simple keywords or strings that follow the specifiedtype. Multiple attributes, when required, are separated by spaces.For example:

definevoid@f()noinline{...}definevoid@f()alwaysinline{...}definevoid@f()alwaysinlineoptsize{...}definevoid@f()optsize{...}definevoid@f()"no-sse"{...}
alignstack(<n>)

This attribute indicates that, when emitting the prologue andepilogue, the backend should forcibly align the stack pointer.Specify the desired alignment, which must be a power of two, inparentheses.

"alloc-family"="FAMILY"

This indicates which “family” an allocator function is part of. To avoidcollisions, the family name should match the mangled name of the primaryallocator function, that is “malloc” for malloc/calloc/realloc/free,“_Znwm” for::operator::new and::operator::delete, and“_ZnwmSt11align_val_t” for aligned::operator::new and::operator::delete. Matching malloc/realloc/free calls within a familycan be optimized, but mismatched ones will be left alone.

allockind("KIND")

Describes the behavior of an allocation function. The KIND string contains commaseparated entries from the following options:

  • “alloc”: the function returns a new block of memory or null.

  • “realloc”: the function returns a new block of memory or null. If theresult is non-null the memory contents from the start of the block up tothe smaller of the original allocation size and the new allocation sizewill match that of theallocptr argument and theallocptrargument is invalidated, even if the function returns the same address.

  • “free”: the function frees the block of memory specified byallocptr.Functions marked as “free”allockind must return void.

  • “uninitialized”: Any newly-allocated memory (either a new block froma “alloc” function or the enlarged capacity from a “realloc” function)will be uninitialized.

  • “zeroed”: Any newly-allocated memory (either a new block from a “alloc”function or the enlarged capacity from a “realloc” function) will bezeroed.

  • “aligned”: the function returns memory aligned according to theallocalign parameter.

The first three options are mutually exclusive, and the remaining optionsdescribe more details of how the function behaves. The remaining optionsare invalid for “free”-type functions.

"alloc-variant-zeroed"="FUNCTION"

This attribute indicates that another function is equivalent to an allocator function,but returns zeroed memory. The function must have “zeroed” allocation behavior,the samealloc-family, and take exactly the same arguments.

allocsize(<EltSizeParam>[,<NumEltsParam>])

This attribute indicates that the annotated function will always return atleast a given number of bytes (or null). Its arguments are zero-indexedparameter numbers; if one argument is provided, then it’s assumed that atleastCallSite.Args[EltSizeParam] bytes will be available at thereturned pointer. If two are provided, then it’s assumed thatCallSite.Args[EltSizeParam]*CallSite.Args[NumEltsParam] bytes areavailable. The referenced parameters must be integer types. No assumptionsare made about the contents of the returned block of memory.

alwaysinline

This attribute indicates that the inliner should attempt to inlinethis function into callers whenever possible, ignoring any activeinlining size threshold for this caller.

builtin

This indicates that the callee function at a call site should berecognized as a built-in function, even though the function’s declarationuses thenobuiltin attribute. This is only valid at call sites fordirect calls to functions that are declared with thenobuiltinattribute.

cold

This attribute indicates that this function is rarely called. Whencomputing edge weights, basic blocks post-dominated by a coldfunction call are also considered to be cold; and, thus, given lowweight.

convergent

This attribute indicates that this function is convergent.When it appears on a call/invoke, the convergent attributeindicates that we should treat the call as though we’re calling aconvergent function. This is particularly useful on indirectcalls; without this we may treat such calls as though the targetis non-convergent.

SeeConvergent Operation Semantics for further details.

It is an error to callllvm.experimental.convergence.entry from a function thatdoes not have this attribute.

disable_sanitizer_instrumentation

When instrumenting code with sanitizers, it can be important to skip certainfunctions to ensure no instrumentation is applied to them.

This attribute is not always similar to absentsanitize_<name>attributes: depending on the specific sanitizer, code can be inserted intofunctions regardless of thesanitize_<name> attribute to prevent falsepositive reports.

disable_sanitizer_instrumentation disables all kinds of instrumentation,taking precedence over thesanitize_<name> attributes and other compilerflags.

"dontcall-error"

This attribute denotes that an error diagnostic should be emitted when acall of a function with this attribute is not eliminated via optimization.Front ends can provide optionalsrcloc metadata nodes on call sites ofsuch callees to attach information about where in the source language such acall came from. A string value can be provided as a note.

"dontcall-warn"

This attribute denotes that a warning diagnostic should be emitted when acall of a function with this attribute is not eliminated via optimization.Front ends can provide optionalsrcloc metadata nodes on call sites ofsuch callees to attach information about where in the source language such acall came from. A string value can be provided as a note.

fn_ret_thunk_extern

This attribute tells the code generator that returns from functions shouldbe replaced with jumps to externally-defined architecture-specific symbols.For X86, this symbol’s identifier is__x86_return_thunk.

"frame-pointer"

This attribute tells the code generator whether the functionshould keep the frame pointer. The code generator may emit the frame pointereven if this attribute says the frame pointer can be eliminated.The allowed string values are:

  • "none" (default) - the frame pointer can be eliminated, and it’sregister can be used for other purposes.

  • "reserved" - the frame pointer register must either be updated topoint to a valid frame record for the current function, or not bemodified.

  • "non-leaf" - the frame pointer should be kept if the function callsother functions.

  • "all" - the frame pointer should be kept.

hot

This attribute indicates that this function is a hot spot of the programexecution. The function will be optimized more aggressively and will beplaced into special subsection of the text section to improving locality.

When profile feedback is enabled, this attribute has the precedence overthe profile information. By marking a functionhot, users can workaround the cases where the training input does not have good coverageon all the hot functions.

inlinehint

This attribute indicates that the source code contained a hint thatinlining this function is desirable (such as the “inline” keyword inC/C++). It is just a hint; it imposes no requirements on theinliner.

jumptable

This attribute indicates that the function should be added to ajump-instruction table at code-generation time, and that all address-takenreferences to this function should be replaced with a reference to theappropriate jump-instruction-table function pointer. Note that this createsa new pointer for the original function, which means that code that dependson function-pointer identity can break. So, any function annotated withjumptable must also beunnamed_addr.

memory(...)

This attribute specifies the possible memory effects of the call-site orfunction. It allows specifying the possible access kinds (none,read,write, orreadwrite) for the possible memory locationkinds (argmem,inaccessiblemem,errnomem, as well as a default).It is best understood by example:

  • memory(none): Does not access any memory.

  • memory(read): May read (but not write) any memory.

  • memory(write): May write (but not read) any memory.

  • memory(readwrite): May read or write any memory.

  • memory(argmem:read): May only read argument memory.

  • memory(argmem:read,inaccessiblemem:write): May only read argumentmemory and only write inaccessible memory.

  • memory(argmem:read,errnomem:write): May only read argument memoryand only write errno.

  • memory(read,argmem:readwrite): May read any memory (default mode)and additionally write argument memory.

  • memory(readwrite,argmem:none): May access any memory apart fromargument memory.

The supported access kinds are:

  • readwrite: Any kind of access to the location is allowed.

  • read: The location is only read. Writing to the location is immediateundefined behavior. This includes the case where the location is read fromand then the same value is written back.

  • write: Only writes to the location are observable outside the functioncall. However, the function may still internally read the location afterwriting it, as this is not observable. Reading the location prior towriting it results in a poison value.

  • none: No reads or writes to the location are observed outside thefunction. It is always valid to read and write allocas, and to read globalconstants, even ifmemory(none) is used, as these effects are notexternally observable.

The supported memory location kinds are:

  • argmem: This refers to accesses that are based on pointer argumentsto the function.

  • inaccessiblemem: This refers to accesses to memory which is notaccessible by the current module (before return from the function – anallocator function may return newly accessible memory while onlyaccessing inaccessible memory itself). Inaccessible memory is often usedto model control dependencies of intrinsics.

  • errnomem: This refers to accesses to theerrno variable.

  • The default access kind (specified without a location prefix) applies toall locations that haven’t been specified explicitly, including those thatdon’t currently have a dedicated location kind (e.g. accesses to globalsor captured pointers).

If thememory attribute is not specified, thenmemory(readwrite)is implied (all memory effects are possible).

The memory effects of a call can be computed asCallSiteEffects&(FunctionEffects|OperandBundleEffects). Thus, thecall-site annotation takes precedence over the potential effects describedby either the function annotation or the operand bundles.

minsize

This attribute suggests that optimization passes and code generatorpasses make choices that keep the code size of this function as smallas possible and perform optimizations that may sacrifice runtimeperformance in order to minimize the size of the generated code.This attribute is incompatible with theoptdebug andoptnoneattributes.

naked

This attribute disables prologue / epilogue emission for thefunction. This can have very system-specific consequences. The arguments ofanaked function can not be referenced through IR values.

"no-inline-line-tables"

When this attribute is set to true, the inliner discards source locationswhen inlining code and instead uses the source location of the call site.Breakpoints set on code that was inlined into the current function willnot fire during the execution of the inlined call sites. If the debuggerstops inside an inlined call site, it will appear to be stopped at theoutermost inlined call site.

no-jump-tables

When this attribute is set to true, the jump tables and lookup tables thatcan be generated from a switch case lowering are disabled.

nobuiltin

This indicates that the callee function at a call site is not recognized asa built-in function. LLVM will retain the original call and not replace itwith equivalent code based on the semantics of the built-in function, unlessthe call site uses thebuiltin attribute. This is valid at call sitesand on function declarations and definitions.

nocallback

This attribute indicates that the function is only allowed to jump back intocaller’s module by a return or an exception, and is not allowed to jump backby invoking a callback function, a direct, possibly transitive, externalfunction call, use oflongjmp, or other means. It is a compiler hint thatis used at module level to improve dataflow analysis, dropped during linking,and has no effect on functions defined in the current module.

nodivergencesource

A call to this function is not a source of divergence. In uniformityanalysis, asource of divergence is an instruction that generatesdivergence even if its inputs are uniform. A call with no further informationwould normally be considered a source of divergence; setting this attributeon a function means that a call to it is not a source of divergence.

noduplicate

This attribute indicates that calls to the function cannot beduplicated. A call to anoduplicate function may be movedwithin its parent function, but may not be duplicated withinits parent function.

A function containing anoduplicate call may stillbe an inlining candidate, provided that the call is notduplicated by inlining. That implies that the function hasinternal linkage and only has one call site, so the originalcall is dead after inlining.

nofree

This function attribute indicates that the function does not, directly ortransitively, call a memory-deallocation function (free, for example)on a memory allocation which existed before the call.

As a result, uncaptured pointers that are known to be dereferenceableprior to a call to a function with thenofree attribute are stillknown to be dereferenceable after the call. The capturing condition isnecessary in environments where the function might communicate thepointer to another thread which then deallocates the memory. Alternatively,nosync would ensure such communication cannot happen and even capturedpointers cannot be freed by the function.

Anofree function is explicitly allowed to free memory which itallocated or (if notnosync) arrange for another thread to freememory on it’s behalf. As a result, perhaps surprisingly, anofreefunction can return a pointer to a previously deallocatedallocated object.

noimplicitfloat

Disallows implicit floating-point code. This inhibits optimizations thatuse floating-point code and floating-point registers for operations that arenot nominally floating-point. LLVM instructions that perform floating-pointoperations or require access to floating-point registers may still causefloating-point code to be generated.

Also inhibits optimizations that create SIMD/vector code and registers fromscalar code such as vectorization or memcpy/memset optimization. Thisincludes integer vectors. Vector instructions present in IR may still causevector code to be generated.

noinline

This attribute indicates that the inliner should never inline thisfunction in any situation. This attribute may not be used togetherwith thealwaysinline attribute.

nomerge

This attribute indicates that calls to this function should never be mergedduring optimization. For example, it will prevent tail merging otherwiseidentical code sequences that raise an exception or terminate the program.Tail merging normally reduces the precision of source location information,making stack traces less useful for debugging. This attribute gives theuser control over the tradeoff between code size and debug informationprecision.

nonlazybind

This attribute suppresses lazy symbol binding for the function. Thismay make calls to the function faster, at the cost of extra programstartup time if the function is not called during program startup.

noprofile

This function attribute prevents instrumentation based profiling, used forcoverage or profile based optimization, from being added to a function. Italso blocks inlining if the caller and callee have different values of thisattribute.

skipprofile

This function attribute prevents instrumentation based profiling, used forcoverage or profile based optimization, from being added to a function. Thisattribute does not restrict inlining, so instrumented instruction could endup in this function.

noredzone

This attribute indicates that the code generator should not use ared zone, even if the target-specific ABI normally permits it.

indirect-tls-seg-refs

This attribute indicates that the code generator should not usedirect TLS access through segment registers, even if thetarget-specific ABI normally permits it.

noreturn

This function attribute indicates that the function never returnsnormally, hence through a return instruction. This produces undefinedbehavior at runtime if the function ever does dynamically return. Annotatedfunctions may still raise an exception, i.a.,nounwind is not implied.

norecurse

This function attribute indicates that the function does not call itselfeither directly or indirectly down any possible call path. This producesundefined behavior at runtime if the function ever does recurse.

willreturn

This function attribute indicates that a call of this function willeither exhibit undefined behavior or comes back and continues executionat a point in the existing call stack that includes the current invocation.Annotated functions may still raise an exception, i.a.,nounwind is not implied.If an invocation of an annotated function does not return control backto a point in the call stack, the behavior is undefined.

nosync

This function attribute indicates that the function does not communicate(synchronize) with another thread through memory or other well-defined means.Synchronization is considered possible in the presence ofatomic accessesthat enforce an order, thus not “unordered” and “monotonic”,volatile accesses,as well asconvergent function calls.

Note thatconvergent operations can involve communication that isconsidered to be not through memory and does not necessarily imply anordering between threads for the purposes of the memory model. Therefore,an operation can be bothconvergent andnosync.

If anosync function does ever synchronize with another thread,the behavior is undefined.

nounwind

This function attribute indicates that the function never raises anexception. If the function does raise an exception, its runtimebehavior is undefined. However, functions marked nounwind may stilltrap or generate asynchronous exceptions. Exception handling schemesthat are recognized by LLVM to handle asynchronous exceptions, suchas SEH, will still provide their implementation defined semantics.

nosanitize_bounds

This attribute indicates that bounds checking sanitizer instrumentationis disabled for this function.

nosanitize_coverage

This attribute indicates that SanitizerCoverage instrumentation is disabledfor this function.

null_pointer_is_valid

Ifnull_pointer_is_valid is set, then thenull addressin address-space 0 is considered to be a valid address for memory loads andstores. Any analysis or optimization should not treat dereferencing apointer tonull as undefined behavior in this function.Note: Comparing address of a global variable tonull may stillevaluate to false because of a limitation in querying this attribute insideconstant expressions.

optdebug

This attribute suggests that optimization passes and code generator passesshould make choices that try to preserve debug info without significantlydegrading runtime performance.This attribute is incompatible with theminsize,optsize, andoptnone attributes.

optforfuzzing

This attribute indicates that this function should be optimizedfor maximum fuzzing signal.

optnone

This function attribute indicates that most optimization passes will skipthis function, with the exception of interprocedural optimization passes.Code generation defaults to the “fast” instruction selector.This attribute cannot be used together with thealwaysinlineattribute; this attribute is also incompatiblewith theminsize,optsize, andoptdebug attributes.

This attribute requires thenoinline attribute to be specified onthe function as well, so the function is never inlined into any caller.Only functions with thealwaysinline attribute are validcandidates for inlining into the body of this function.

optsize

This attribute suggests that optimization passes and code generatorpasses make choices that keep the code size of this function low,and otherwise do optimizations specifically to reduce code size aslong as they do not significantly impact runtime performance.This attribute is incompatible with theoptdebug andoptnoneattributes.

"patchable-function"

This attribute tells the code generator that the codegenerated for this function needs to follow certain conventions thatmake it possible for a runtime function to patch over it later.The exact effect of this attribute depends on its string value,for which there currently is one legal possibility:

  • "prologue-short-redirect" - This style of patchablefunction is intended to support patching a function prologue toredirect control away from the function in a thread safemanner. It guarantees that the first instruction of thefunction will be large enough to accommodate a short jumpinstruction, and will be sufficiently aligned to allow beingfully changed via an atomic compare-and-swap instruction.While the first requirement can be satisfied by inserting largeenough NOP, LLVM can and will try to re-purpose an existinginstruction (i.e. one that would have to be emitted anyway) asthe patchable instruction larger than a short jump.

    "prologue-short-redirect" is currently only supported onx86-64.

This attribute by itself does not imply restrictions oninter-procedural optimizations. All of the semantic effects thepatching may have to be separately conveyed via the linkage type.

"probe-stack"

This attribute indicates that the function will trigger a guard regionin the end of the stack. It ensures that accesses to the stack must beno further apart than the size of the guard region to a previousaccess of the stack. It takes one required string value, the name ofthe stack probing function that will be called.

If a function that has a"probe-stack" attribute is inlined intoa function with another"probe-stack" attribute, the resultingfunction has the"probe-stack" attribute of the caller. If afunction that has a"probe-stack" attribute is inlined into afunction that has no"probe-stack" attribute at all, the resultingfunction has the"probe-stack" attribute of the callee.

"stack-probe-size"

This attribute controls the behavior of stack probes: eitherthe"probe-stack" attribute, or ABI-required stack probes, if any.It defines the size of the guard region. It ensures that if the functionmay use more stack space than the size of the guard region, stack probingsequence will be emitted. It takes one required integer value, whichis 4096 by default.

If a function that has a"stack-probe-size" attribute is inlined intoa function with another"stack-probe-size" attribute, the resultingfunction has the"stack-probe-size" attribute that has the lowernumeric value. If a function that has a"stack-probe-size" attribute isinlined into a function that has no"stack-probe-size" attributeat all, the resulting function has the"stack-probe-size" attributeof the callee.

"no-stack-arg-probe"

This attribute disables ABI-required stack probes, if any.

returns_twice

This attribute indicates that this function can return twice. The Csetjmp is an example of such a function. The compiler disablessome optimizations (like tail calls) in the caller of thesefunctions.

safestack

This attribute indicates thatSafeStackprotection is enabled for this function.

If a function that has asafestack attribute is inlined into afunction that doesn’t have asafestack attribute or which has anssp,sspstrong orsspreq attribute, then the resultingfunction will have asafestack attribute.

sanitize_address

This attribute indicates that AddressSanitizer checks(dynamic address safety analysis) are enabled for this function.

sanitize_memory

This attribute indicates that MemorySanitizer checks (dynamic detectionof accesses to uninitialized memory) are enabled for this function.

sanitize_thread

This attribute indicates that ThreadSanitizer checks(dynamic thread safety analysis) are enabled for this function.

sanitize_hwaddress

This attribute indicates that HWAddressSanitizer checks(dynamic address safety analysis based on tagged pointers) are enabled forthis function.

sanitize_memtag

This attribute indicates that MemTagSanitizer checks(dynamic address safety analysis based on Armv8 MTE) are enabled forthis function.

sanitize_realtime

This attribute indicates that RealtimeSanitizer checks(realtime safety analysis - no allocations, syscalls or exceptions) are enabledfor this function.

sanitize_realtime_blocking

This attribute indicates that RealtimeSanitizer should error immediatelyif the attributed function is called during invocation of a functionattributed withsanitize_realtime.This attribute is incompatible with thesanitize_realtime attribute.

speculative_load_hardening

This attribute indicates thatSpeculative Load Hardeningshould be enabled for the function body.

Speculative Load Hardening is a best-effort mitigation againstinformation leak attacks that make use of control flowmiss-speculation - specifically miss-speculation of whether a branchis taken or not. Typically vulnerabilities enabling such attacks areclassified as “Spectre variant #1”. Notably, this does not attempt tomitigate against miss-speculation of branch target, classified as“Spectre variant #2” vulnerabilities.

When inlining, the attribute is sticky. Inlining a function that carriesthis attribute will cause the caller to gain the attribute. This is intendedto provide a maximally conservative model where the code in a functionannotated with this attribute will always (even after inlining) end uphardened.

speculatable

This function attribute indicates that the function does not have anyeffects besides calculating its result and does not have undefined behavior.Note thatspeculatable is not enough to conclude that along anyparticular execution path the number of calls to this function will not beexternally observable. This attribute is only valid on functionsand declarations, not on individual call sites. If a function isincorrectly marked as speculatable and really does exhibitundefined behavior, the undefined behavior may be observed evenif the call site is dead code.

ssp

This attribute indicates that the function should emit a stacksmashing protector. It is in the form of a “canary” — a random valueplaced on the stack before the local variables that’s checked uponreturn from the function to see if it has been overwritten. Aheuristic is used to determine if a function needs stack protectorsor not. The heuristic used will enable protectors for functions with:

  • Character arrays larger thanssp-buffer-size (default 8).

  • Aggregates containing character arrays larger thanssp-buffer-size.

  • Calls to alloca() with variable sizes or constant sizes greater thanssp-buffer-size.

Variables that are identified as requiring a protector will be arrangedon the stack such that they are adjacent to the stack protector guard.

If a function with anssp attribute is inlined into a calling function,the attribute is not carried over to the calling function.

sspstrong

This attribute indicates that the function should emit a stack smashingprotector. This attribute causes a strong heuristic to be used whendetermining if a function needs stack protectors. The strong heuristicwill enable protectors for functions with:

  • Arrays of any size and type

  • Aggregates containing an array of any size and type.

  • Calls to alloca().

  • Local variables that have had their address taken.

Variables that are identified as requiring a protector will be arrangedon the stack such that they are adjacent to the stack protector guard.The specific layout rules are:

  1. Large arrays and structures containing large arrays(>=ssp-buffer-size) are closest to the stack protector.

  2. Small arrays and structures containing small arrays(<ssp-buffer-size) are 2nd closest to the protector.

  3. Variables that have had their address taken are 3rd closest to theprotector.

This overrides thessp function attribute.

If a function with ansspstrong attribute is inlined into a callingfunction which has anssp attribute, the calling function’s attributewill be upgraded tosspstrong.

sspreq

This attribute indicates that the function shouldalways emit a stacksmashing protector. This overrides thessp andsspstrong functionattributes.

Variables that are identified as requiring a protector will be arrangedon the stack such that they are adjacent to the stack protector guard.The specific layout rules are:

  1. Large arrays and structures containing large arrays(>=ssp-buffer-size) are closest to the stack protector.

  2. Small arrays and structures containing small arrays(<ssp-buffer-size) are 2nd closest to the protector.

  3. Variables that have had their address taken are 3rd closest to theprotector.

If a function with ansspreq attribute is inlined into a callingfunction which has anssp orsspstrong attribute, the callingfunction’s attribute will be upgraded tosspreq.

strictfp

This attribute indicates that the function was called from a scope thatrequires strict floating-point semantics. LLVM will not attempt anyoptimizations that require assumptions about the floating-point roundingmode or that might alter the state of floating-point status flags thatmight otherwise be set or cleared by calling this function. LLVM willnot introduce any new floating-point instructions that may trap.

"denormal-fp-math"

This indicates the denormal (subnormal) handling that may beassumed for the default floating-point environment. This is acomma separated pair. The elements may be one of"ieee","preserve-sign","positive-zero", or"dynamic". Thefirst entry indicates the flushing mode for the result of floatingpoint operations. The second indicates the handling of denormal inputsto floating point instructions. For compatibility with olderbitcode, if the second value is omitted, both input and outputmodes will assume the same mode.

If this is attribute is not specified, the default is"ieee,ieee".

If the output mode is"preserve-sign", or"positive-zero",denormal outputs may be flushed to zero by standard floating-pointoperations. It is not mandated that flushing to zero occurs, but ifa denormal output is flushed to zero, it must respect the signmode. Not all targets support all modes.

If the mode is"dynamic", the behavior is derived from thedynamic state of the floating-point environment. Transformationswhich depend on the behavior of denormal values should not beperformed.

While this indicates the expected floating point mode the functionwill be executed with, this does not make any attempt to ensurethe mode is consistent. User or platform code is expected to setthe floating point mode appropriately before function entry.

If the input mode is"preserve-sign", or"positive-zero",a floating-point operation must treat any input denormal value aszero. In some situations, if an instruction does not respect thismode, the input may need to be converted to 0 as if by@llvm.canonicalize during lowering for correctness.

"denormal-fp-math-f32"

Same as"denormal-fp-math", but only controls the behavior ofthe 32-bit float type (or vectors of 32-bit floats). If both areare present, this overrides"denormal-fp-math". Not all targetssupport separately setting the denormal mode per type, and noattempt is made to diagnose unsupported uses. Currently thisattribute is respected by the AMDGPU and NVPTX backends.

"thunk"

This attribute indicates that the function will delegate to some otherfunction with a tail call. The prototype of a thunk should not be used foroptimization purposes. The caller is expected to cast the thunk prototype tomatch the thunk target prototype.

uwtable[(sync|async)]

This attribute indicates that the ABI being targeted requires thatan unwind table entry be produced for this function even if we canshow that no exceptions passes by it. This is normally the case forthe ELF x86-64 abi, but it can be disabled for some compilationunits. The optional parameter describes what kind of unwind tablesto generate:sync for normal unwind tables,async for asynchronous(instruction precise) unwind tables. Without the parameter, the attributeuwtable is equivalent touwtable(async).

nocf_check

This attribute indicates that no control-flow check will be performed onthe attributed entity. It disables -fcf-protection=<> for a specificentity to fine grain the HW control flow protection mechanism. The flagis target independent and currently appertains to a function or functionpointer.

shadowcallstack

This attribute indicates that the ShadowCallStack checks are enabled forthe function. The instrumentation checks that the return address for thefunction has not changed between the function prolog and epilog. It iscurrently x86_64-specific.

mustprogress

This attribute indicates that the function is required to return, unwind,or interact with the environment in an observable way e.g. via a volatilememory access, I/O, or other synchronization. Themustprogressattribute is intended to model the requirements of the first section of[intro.progress] of the C++ Standard. As a consequence, a loop in afunction with themustprogress attribute can be assumed to terminate ifit does not interact with the environment in an observable way, andterminating loops without side-effects can be removed. If amustprogressfunction does not satisfy this contract, the behavior is undefined. If amustprogress function calls a function not markedmustprogress,and that function never returns, the program is well-defined even if thereisn’t any other observable progress. Note thatwillreturn impliesmustprogress.

"warn-stack-size"="<threshold>"

This attribute sets a threshold to emit diagnostics once the frame size isknown should the frame size exceed the specified value. It takes onerequired integer value, which should be a non-negative integer, and lessthanUINT_MAX. It’s unspecified which threshold will be used whenduplicate definitions are linked together with differing values.

vscale_range(<min>[,<max>])

This function attribute indicatesvscale is a power-of-two within aspecified range.min must be a power-of-two that is greater than 0. Whenspecified,max must be a power-of-two greater-than-or-equal tomin or 0to signify an unbounded maximum. The syntaxvscale_range(<val>) can beused to set bothmin andmax to the same value. Functions that don’tinclude this attribute make no assumptions about the value ofvscale.

"nooutline"

This attribute indicates that outlining passes should not modify thefunction.

Call Site Attributes

In addition to function attributes the following call site onlyattributes are supported:

vector-function-abi-variant

This attribute can be attached to acall to listthe vector functions associated to the function. Notice that theattribute cannot be attached to ainvoke or acallbr instruction. The attribute consists of acomma separated list of mangled names. The order of the list doesnot imply preference (it is logically a set). The compiler is freeto pick any listed vector function of its choosing.

The syntax for the mangled names is as follows::

_ZGV<isa><mask><vlen><parameters>_<scalar_name>[(<vector_redirection>)]

When present, the attribute informs the compiler that the function<scalar_name> has a corresponding vector variant that can beused to perform the concurrent invocation of<scalar_name> onvectors. The shape of the vector function is described by thetokens between the prefix_ZGV and the<scalar_name>token. The standard name of the vector function is_ZGV<isa><mask><vlen><parameters>_<scalar_name>. When present,the optional token(<vector_redirection>) informs the compilerthat a custom name is provided in addition to the standard one(custom names can be provided for example via the use ofdeclarevariant in OpenMP 5.0). The declaration of the variant must bepresent in the IR Module. The signature of the vector variant isdetermined by the rules of the Vector Function ABI (VFABI)specifications of the target. For Arm and X86, the VFABI can befound athttps://github.com/ARM-software/abi-aa andhttps://software.intel.com/content/www/us/en/develop/download/vector-simd-function-abi.html,respectively.

For X86 and Arm targets, the values of the tokens in the standardname are those that are defined in the VFABI. LLVM has an internal<isa> token that can be used to create scalar-to-vectormappings for functions that are not directly associated to any ofthe target ISAs (for example, some of the mappings stored in theTargetLibraryInfo). Valid values for the<isa> token are::

<isa>:=b|c|d|e->X86SSE,AVX,AVX2,AVX512|n|s->Armv8AdvancedSIMD,SVE|__LLVM__->InternalLLVMVectorISA

For all targets currently supported (x86, Arm and Internal LLVM),the remaining tokens can have the following values::

<mask>:=M|N->mask|nomask<vlen>:=number->numberoflanes|x->VLA(VectorLengthAgnostic)<parameters>:=v->vector|l|l<number>->linear|R|R<number>->linearwithrefmodifier|L|L<number>->linearwithvalmodifier|U|U<number>->linearwithuvalmodifier|ls<pos>->runtimelinear|Rs<pos>->runtimelinearwithrefmodifier|Ls<pos>->runtimelinearwithvalmodifier|Us<pos>->runtimelinearwithuvalmodifier|u->uniform<scalar_name>:=nameofthescalarfunction<vector_redirection>:=optional,customnameofthevectorfunction
preallocated(<ty>)

This attribute is required on calls tollvm.call.preallocated.argand cannot be used on any other call. Seellvm.call.preallocated.arg for moredetails.

Global Attributes

Attributes may be set to communicate additional information about a global variable.Unlikefunction attributes, attributes on a global variableare grouped into a singleattribute group.

no_sanitize_address

This attribute indicates that the global variable should not haveAddressSanitizer instrumentation applied to it, because it was annotatedwith__attribute__((no_sanitize(“address”))),__attribute__((disable_sanitizer_instrumentation)), or included in the-fsanitize-ignorelist file.

no_sanitize_hwaddress

This attribute indicates that the global variable should not haveHWAddressSanitizer instrumentation applied to it, because it was annotatedwith__attribute__((no_sanitize(“hwaddress”))),__attribute__((disable_sanitizer_instrumentation)), or included in the-fsanitize-ignorelist file.

sanitize_memtag

This attribute indicates that the global variable should have AArch64 memorytags (MTE) instrumentation applied to it. This attribute causes thesuppression of certain optimizations, like GlobalMerge, as well as ensuringextra directives are emitted in the assembly and extra bits of metadata areplaced in the object file so that the linker can ensure the accesses areprotected by MTE. This attribute is added by clang when-fsanitize=memtag-globals is provided, as long as the global is not markedwith__attribute__((no_sanitize(“memtag”))),__attribute__((disable_sanitizer_instrumentation)), or included in the-fsanitize-ignorelist file. The AArch64 Globals Tagging pass may removethis attribute when it’s not possible to tag the global (e.g. it’s a TLSvariable).

sanitize_address_dyninit

This attribute indicates that the global variable, when instrumented withAddressSanitizer, should be checked for ODR violations. This attribute isapplied to global variables that are dynamically initialized according toC++ rules.

Operand Bundles

Operand bundles are tagged sets of SSA values or metadata strings that can beassociated with certain LLVM instructions (currently onlycall s andinvoke s). In a way they are like metadata, but dropping them isincorrect and will change program semantics.

Syntax:

operandbundleset::='['operandbundle(,operandbundle)*']'operandbundle::=tag'('[bundleoperand](,bundleoperand)*')'bundleoperand::=SSAvalue|metadatastringtag::=stringconstant

Operand bundles arenot part of a function’s signature, and agiven function may be called from multiple places with different kindsof operand bundles. This reflects the fact that the operand bundlesare conceptually a part of thecall (orinvoke), not thecallee being dispatched to.

Operand bundles are a generic mechanism intended to supportruntime-introspection-like functionality for managed languages. Whilethe exact semantics of an operand bundle depend on the bundle tag,there are certain limitations to how much the presence of an operandbundle can influence the semantics of a program. These restrictionsare described as the semantics of an “unknown” operand bundle. Aslong as the behavior of an operand bundle is describable within theserestrictions, LLVM does not need to have special knowledge of theoperand bundle to not miscompile programs containing it.

  • The bundle operands for an unknown operand bundle escape in unknownways before control is transferred to the callee or invokee.

  • Calls and invokes with operand bundles have unknown read / writeeffect on the heap on entry and exit (even if the call target specifiesamemory attribute), unless they’re overridden withcallsite specific attributes.

  • An operand bundle at a call site cannot change the implementationof the called function. Inter-procedural optimizations work asusual as long as they take into account the first two properties.

More specific types of operand bundles are described below.

Deoptimization Operand Bundles

Deoptimization operand bundles are characterized by the"deopt"operand bundle tag. These operand bundles represent an alternate“safe” continuation for the call site they’re attached to, and can beused by a suitable runtime to deoptimize the compiled frame at thespecified call site. There can be at most one"deopt" operandbundle attached to a call site. Exact details of deoptimization isout of scope for the language reference, but it usually involvesrewriting a compiled frame into a set of interpreted frames.

From the compiler’s perspective, deoptimization operand bundles makethe call sites they’re attached to at leastreadonly. They readthrough all of their pointer typed operands (even if they’re nototherwise escaped) and the entire visible heap. Deoptimizationoperand bundles do not capture their operands except duringdeoptimization, in which case control will not be returned to thecompiled frame.

The inliner knows how to inline through calls that have deoptimizationoperand bundles. Just like inlining through a normal call siteinvolves composing the normal and exceptional continuations, inliningthrough a call site with a deoptimization operand bundle needs toappropriately compose the “safe” deoptimization continuation. Theinliner does this by prepending the parent’s deoptimizationcontinuation to every deoptimization continuation in the inlined body.E.g. inlining@f into@g in the following example

definevoid@f(){callvoid@x();; no deopt statecallvoid@y()["deopt"(i3210)]callvoid@y()["deopt"(i3210),"unknown"(ptrnull)]retvoid}definevoid@g(){callvoid@f()["deopt"(i3220)]retvoid}

will result in

definevoid@g(){callvoid@x();; still no deopt statecallvoid@y()["deopt"(i3220,i3210)]callvoid@y()["deopt"(i3220,i3210),"unknown"(ptrnull)]retvoid}

It is the frontend’s responsibility to structure or encode thedeoptimization state in a way that syntactically prepending thecaller’s deoptimization state to the callee’s deoptimization state issemantically equivalent to composing the caller’s deoptimizationcontinuation after the callee’s deoptimization continuation.

Funclet Operand Bundles

Funclet operand bundles are characterized by the"funclet"operand bundle tag. These operand bundles indicate that a call siteis within a particular funclet. There can be at most one"funclet" operand bundle attached to a call site and it must haveexactly one bundle operand.

If any funclet EH pads have been “entered” but not “exited” (per thedescription in the EH doc),it is undefined behavior to execute acall orinvoke which:

  • does not have a"funclet" bundle and is not acall to a nounwindintrinsic, or

  • has a"funclet" bundle whose operand is not the most-recently-enterednot-yet-exited funclet EH pad.

Similarly, if no funclet EH pads have been entered-but-not-yet-exited,executing acall orinvoke with a"funclet" bundle is undefined behavior.

GC Transition Operand Bundles

GC transition operand bundles are characterized by the"gc-transition" operand bundle tag. These operand bundles mark acall as a transition between a function with one GC strategy to afunction with a different GC strategy. If coordinating the transitionbetween GC strategies requires additional code generation at the callsite, these bundles may contain any values that are needed by thegenerated code. For more details, seeGC Transitions.

The bundle contain an arbitrary list of Values which need to be passedto GC transition code. They will be lowered and passed as operands tothe appropriate GC_TRANSITION nodes in the selection DAG. It is assumedthat these arguments must be available before and after (but notnecessarily during) the execution of the callee.

Assume Operand Bundles

Operand bundles on anllvm.assume allows representingassumptions, such as that aparameter attribute or afunction attribute holds for a certain value at a certainlocation. Operand bundles enable assumptions that are either hard or impossibleto represent as a boolean argument of anllvm.assume.

An assume operand bundle has the form:

"<tag>"([<arguments>]])

In the case of function or parameter attributes, the operand bundle has therestricted form:

"<tag>"([<holdsforvalue>[,<attributeargument>]])
  • The tag of the operand bundle is usually the name of attribute that can beassumed to hold. It can also beignore, this tag doesn’t contain anyinformation and should be ignored.

  • The first argument if present is the value for which the attribute hold.

  • The second argument if present is an argument of the attribute.

If there are no arguments the attribute is a property of the call location.

For example:

callvoid@llvm.assume(i1true)["align"(ptr%val,i328)]

allows the optimizer to assume that at location of call tollvm.assume%val has an alignment of at least 8.

callvoid@llvm.assume(i1%cond)["cold"(),"nonnull"(ptr%val)]

allows the optimizer to assume that thellvm.assumecall location is cold and that%val may not be null.

Just like for the argument ofllvm.assume, if any of theprovided guarantees are violated at runtime the behavior is undefined.

While attributes expect constant arguments, assume operand bundles may beprovided a dynamic value, for example:

callvoid@llvm.assume(i1true)["align"(ptr%val,i32%align)]

If the operand bundle value violates any requirements on the attribute value,the behavior is undefined, unless one of the following exceptions applies:

  • "align" operand bundles may specify a non-power-of-two alignment(including a zero alignment). If this is the case, then the pointer valuemust be a null pointer, otherwise the behavior is undefined.

  • dereferenceable(<n>) operand bundles only guarantee the pointer isdereferenceable at the point of the assumption. The pointer may not bedereferenceable at later pointers, e.g. because it could have been freed.

In addition to allowing operand bundles encoding function and parameterattributes, an assume operand bundle my also encode aseparate_storageoperand bundle. This has the form:

separate_storage(<val1>,<val2>)``

This indicates that no pointerbased on one of itsarguments can alias any pointer based on the other.

Even if the assumed property can be encoded as a boolean value, likenonnull, using operand bundles to express the property can still havebenefits:

  • Attributes that can be expressed via operand bundles are directly theproperty that the optimizer uses and cares about. Encoding attributes asoperand bundles removes the need for an instruction sequence that representsthe property (e.g.,icmp ne ptr %p, null fornonnull) and for theoptimizer to deduce the property from that instruction sequence.

  • Expressing the property using operand bundles makes it easy to identify theuse of the value as a use in anllvm.assume. This thensimplifies and improves heuristics, e.g., for use “use-sensitive”optimizations.

Preallocated Operand Bundles

Preallocated operand bundles are characterized by the"preallocated"operand bundle tag. These operand bundles allow separation of the allocationof the call argument memory from the call site. This is necessary to passnon-trivially copyable objects by value in a way that is compatible with MSVCon some targets. There can be at most one"preallocated" operand bundleattached to a call site and it must have exactly one bundle operand, which isa token generated by@llvm.call.preallocated.setup. A call with thisoperand bundle should not adjust the stack before entering the function, asthat will have been done by one of the@llvm.call.preallocated.* intrinsics.

%foo=type{i64,i32}...%t=calltoken@llvm.call.preallocated.setup(i321)%a=callptr@llvm.call.preallocated.arg(token%t,i320)preallocated(%foo); initialize %bcallvoid@bar(i3242,ptrpreallocated(%foo)%a)["preallocated"(token%t)]

GC Live Operand Bundles

A “gc-live” operand bundle is only valid on agc.statepointintrinsic. The operand bundle must contain every pointer to a garbage collectedobject which potentially needs to be updated by the garbage collector.

When lowered, any relocated value will be recorded in the correspondingstackmap entry. See the intrinsic descriptionfor further details.

ObjC ARC Attached Call Operand Bundles

A"clang.arc.attachedcall" operand bundle on a call indicates the call isimplicitly followed by a marker instruction and a call to an ObjC runtimefunction that uses the result of the call. The operand bundle takes a mandatorypointer to the runtime function (@objc_retainAutoreleasedReturnValue or@objc_unsafeClaimAutoreleasedReturnValue).The return value of a call with this bundle is used by a call to@llvm.objc.clang.arc.noop.use unless the called function’s return type isvoid, in which case the operand bundle is ignored.

; The marker instruction and a runtime function call are inserted after the call; to @foo.callptr@foo()["clang.arc.attachedcall"(ptr@objc_retainAutoreleasedReturnValue)]callptr@foo()["clang.arc.attachedcall"(ptr@objc_unsafeClaimAutoreleasedReturnValue)]

The operand bundle is needed to ensure the call is immediately followed by themarker instruction and the ObjC runtime call in the final output.

Pointer Authentication Operand Bundles

Pointer Authentication operand bundles are characterized by the"ptrauth" operand bundle tag. They are described in thePointer Authentication document.

KCFI Operand Bundles

A"kcfi" operand bundle on an indirect call indicates that the call willbe preceded by a runtime type check, which validates that the call target isprefixed with atype identifier that matches the operandbundle attribute. For example:

callvoid%0()["kcfi"(i321234)]

Clang emits KCFI operand bundles and the necessary metadata with-fsanitize=kcfi.

Convergence Control Operand Bundles

A “convergencectrl” operand bundle is only valid on aconvergent operation.When present, the operand bundle must contain exactly one value of token type.See theConvergent Operation Semantics document for details.

Module-Level Inline Assembly

Modules may contain “module-level inline asm” blocks, which correspondsto the GCC “file scope inline asm” blocks. These blocks are internallyconcatenated by LLVM and treated as a single unit, but may be separatedin the.ll file if desired. The syntax is very simple:

moduleasm"inline asm code goes here"moduleasm"more can go here"

The strings can contain any character by escaping non-printablecharacters. The escape sequence used is simply “\xx” where “xx” is thetwo digit hex code for the number.

Note that the assembly stringmust be parseable by LLVM’s integrated assembler(unless it is disabled), even when emitting a.s file.

Data Layout

A module may specify a target specific data layout string that specifieshow data is to be laid out in memory. The syntax for the data layout issimply:

targetdatalayout="layout specification"

Thelayout specification consists of a list of specificationsseparated by the minus sign character (‘-‘). Each specification startswith a letter and may include other information after the letter todefine some aspect of the data layout. The specifications accepted areas follows:

E

Specifies that the target lays out data in big-endian form. That is,the bits with the most significance have the lowest addresslocation.

e

Specifies that the target lays out data in little-endian form. Thatis, the bits with the least significance have the lowest addresslocation.

S<size>

Specifies the natural alignment of the stack in bits. Alignmentpromotion of stack variables is limited to the natural stackalignment to avoid dynamic stack realignment. If omitted, the natural stackalignment defaults to “unspecified”, which does not prevent anyalignment promotions.

P<addressspace>

Specifies the address space that corresponds to program memory.Harvard architectures can use this to specify what space LLVMshould place things such as functions into. If omitted, theprogram memory space defaults to the default address space of 0,which corresponds to a Von Neumann architecture that has codeand data in the same space.

G<addressspace>

Specifies the address space to be used by default when creating globalvariables. If omitted, the globals address space defaults to the defaultaddress space 0.Note: variable declarations without an address space are always created inaddress space 0, this property only affects the default value to be usedwhen creating globals without additional contextual information (e.g. inLLVM passes).

A<addressspace>

Specifies the address space of objects created by ‘alloca’.Defaults to the default address space of 0.

p[n]:<size>:<abi>[:<pref>[:<idx>]]

This specifies the properties of a pointer in address spacen.The<size> parameter specifies the size of the bitwise representation.Fornon-integral pointers the representation size maybe larger than the address width of the underlying address space (e.g. toaccommodate additional metadata).The alignment requirements are specified via the<abi> and<pref>erred alignments parameters.The fourth parameter<idx> is the size of the index that used foraddress calculations such asgetelementptr.It must be less than or equal to the pointer size. If not specified, thedefault index size is equal to the pointer size.The index size also specifies the width of addresses in this address space.All sizes are in bits.The address space,n, is optional, and if not specified,denotes the default address space 0. The value ofn must bein the range [1,2^24).

i<size>:<abi>[:<pref>]

This specifies the alignment for an integer type of a given bit<size>. The value of<size> must be in the range [1,2^24).Fori8, the<abi> value must equal 8,that is,i8 must be naturally aligned.

v<size>:<abi>[:<pref>]

This specifies the alignment for a vector type of a given bit<size>. The value of<size> must be in the range [1,2^24).

f<size>:<abi>[:<pref>]

This specifies the alignment for a floating-point type of a given bit<size>. Only values of<size> that are supported by the targetwill work. 32 (float) and 64 (double) are supported on all targets; 80or 128 (different flavors of long double) are also supported on sometargets. The value of<size> must be in the range [1,2^24).

a:<abi>[:<pref>]

This specifies the alignment for an object of aggregate type.In addition to the usual requirements for alignment values,the value of<abi> can also be zero, which means one byte alignment.

F<type><abi>

This specifies the alignment for function pointers.The options for<type> are:

  • i: The alignment of function pointers is independent of the alignmentof functions, and is a multiple of<abi>.

  • n: The alignment of function pointers is a multiple of the explicitalignment specified on the function, and is a multiple of<abi>.

m:<mangling>

If present, specifies that llvm names are mangled in the output. Symbolsprefixed with the mangling escape character\01 are passed throughdirectly to the assembler without the escape character. The mangling styleoptions are

  • e: ELF mangling: Private symbols get a.L prefix.

  • l: GOFF mangling: Private symbols get a@ prefix.

  • m: Mips mangling: Private symbols get a$ prefix.

  • o: Mach-O mangling: Private symbols getL prefix. Othersymbols get a_ prefix.

  • x: Windows x86 COFF mangling: Private symbols get the usual prefix.Regular C symbols get a_ prefix. Functions with__stdcall,__fastcall, and__vectorcall have custom mangling that appends@N where N is the number of bytes used to pass parameters. C++ symbolsstarting with? are not mangled in any way.

  • w: Windows COFF mangling: Similar tox, except that normal Csymbols do not receive a_ prefix.

  • a: XCOFF mangling: Private symbols get aL.. prefix.

n<size1>:<size2>:<size3>...

This specifies a set of native integer widths for the target CPU inbits. For example, it might containn32 for 32-bit PowerPC,n32:64 for PowerPC 64, orn8:16:32:64 for X86-64. Elements ofthis set are considered to support most general arithmetic operationsefficiently.

ni:<addressspace0>:<addressspace1>:<addressspace2>...

This specifies pointer types with the specified address spacesasNon-Integral Pointer Type s. The0address space cannot be specified as non-integral.

<abi> is a lower bound on what is required for a type to be consideredaligned. This is used in various places, such as:

  • The alignment for loads and stores if none is explicitly given.

  • The alignment used to compute struct layout.

  • The alignment used to compute allocation sizes and thusgetelementptroffsets.

  • The alignment below which accesses are considered underaligned.

<pref> allows providing a more optimal alignment that should be used whenpossible, primarily foralloca and the alignment of global variables. It isan optional value that must be greater than or equal to<abi>. If omitted,the preceding: should also be omitted and<pref> will be equal to<abi>.

Unless explicitly stated otherwise, every alignment specification is provided inbits and must be in the range [1,2^16). The value must be a power of two timesthe width of a byte (i.e.align=8*2^N).

When constructing the data layout for a given target, LLVM starts with adefault set of specifications which are then (possibly) overridden bythe specifications in thedatalayout keyword. The defaultspecifications are given in this list:

  • e - little endian

  • p:64:64:64 - 64-bit pointers with 64-bit alignment.

  • p[n]:64:64:64 - Other address spaces are assumed to be thesame as the default address space.

  • S0 - natural stack alignment is unspecified

  • i1:8:8 - i1 is 8-bit (byte) aligned

  • i8:8:8 - i8 is 8-bit (byte) aligned as mandated

  • i16:16:16 - i16 is 16-bit aligned

  • i32:32:32 - i32 is 32-bit aligned

  • i64:32:64 - i64 has ABI alignment of 32-bits but preferredalignment of 64-bits

  • f16:16:16 - half is 16-bit aligned

  • f32:32:32 - float is 32-bit aligned

  • f64:64:64 - double is 64-bit aligned

  • f128:128:128 - quad is 128-bit aligned

  • v64:64:64 - 64-bit vector is 64-bit aligned

  • v128:128:128 - 128-bit vector is 128-bit aligned

  • a:0:64 - aggregates are 64-bit aligned

When LLVM is determining the alignment for a given type, it uses thefollowing rules:

  1. If the type sought is an exact match for one of the specifications,that specification is used.

  2. If no match is found, and the type sought is an integer type, thenthe smallest integer type that is larger than the bitwidth of thesought type is used. If none of the specifications are larger thanthe bitwidth then the largest integer type is used. For example,given the default specifications above, the i7 type will use thealignment of i8 (next largest) while both i65 and i256 will use thealignment of i64 (largest specified).

The function of the data layout string may not be what you expect.Notably, this is not a specification from the frontend of what alignmentthe code generator should use.

Instead, if specified, the target data layout is required to match whatthe ultimatecode generator expects. This string is used by themid-level optimizers to improve code, and this only works if it matcheswhat the ultimate code generator uses. There is no way to generate IRthat does not embed this target-specific detail into the IR. If youdon’t specify the string, the default specifications will be used togenerate a Data Layout and the optimization phases will operateaccordingly and introduce target specificity into the IR with respect tothese default specifications.

Target Triple

A module may specify a target triple string that describes the targethost. The syntax for the target triple is simply:

targettriple="x86_64-apple-macosx10.7.0"

Thetarget triple string consists of a series of identifiers delimitedby the minus sign character (‘-‘). The canonical forms are:

ARCHITECTURE-VENDOR-OPERATING_SYSTEMARCHITECTURE-VENDOR-OPERATING_SYSTEM-ENVIRONMENT

This information is passed along to the backend so that it generatescode for the proper architecture. It’s possible to override this on thecommand line with the-mtriple command line option.

Allocated Objects

An allocated object, memory object, or simply object, is a region of a memoryspace that is reserved by a memory allocation such asalloca,heap allocation calls, and global variable definitions. Once it is allocated,the bytes stored in the region can only be read or written through a pointerthat isbased on the allocation value. If a pointerthat is not based on the object tries to read or write to the object, it isundefined behavior.

The following properties hold for all allocated objects, otherwise thebehavior is undefined:

  • no allocated object may cross the unsigned address space boundary (includingthe pointer after the end of the object),

  • the size of all allocated objects must be non-negative and not exceed thelargest signed integer that fits into the index type.

Object Lifetime

A lifetime of anallocated object is a property thatdecides its accessibility. Unless stated otherwise, an allocated object is alivesince its allocation, and dead after its deallocation. It is undefined behaviorto access an allocated object that isn’t alive, but operations that don’tdereference it such asgetelementptr,ptrtoint andicmp return a valid result.This explains code motion of these instructions across operations that impactthe object’s lifetime. A stack object’s lifetime can be explicitly specifiedusingllvm.lifetime.start andllvm.lifetime.end intrinsic function calls.

Pointer Aliasing Rules

Any memory access must be done through a pointer value associated withan address range of the memory access, otherwise the behavior isundefined. Pointer values are associated with address ranges accordingto the following rules:

  • A pointer value is associated with the addresses associated with anyvalue it isbased on.

  • An address of a global variable is associated with the address rangeof the variable’s storage.

  • The result value of an allocation instruction is associated with theaddress range of the allocated storage.

  • A null pointer in the default address-space is associated with noaddress.

  • Anundef value inany address-space isassociated with no address.

  • An integer constant other than zero or a pointer value returned froma function not defined within LLVM may be associated with addressranges allocated through mechanisms other than those provided byLLVM. Such ranges shall not overlap with any ranges of addressesallocated by mechanisms provided by LLVM.

A pointer value isbased on another pointer value according to thefollowing rules:

  • A pointer value formed from a scalargetelementptr operation isbased onthe pointer-typed operand of thegetelementptr.

  • The pointer in lanel of the result of a vectorgetelementptr operationisbased on the pointer in lanel of the vector-of-pointers-typed operandof thegetelementptr.

  • The result value of abitcast isbased on the operand of thebitcast.

  • A pointer value formed by aninttoptr isbased on all pointervalues that contribute (directly or indirectly) to the computation ofthe pointer’s value.

  • The “based on” relationship is transitive.

Note that this definition of“based” is intentionally similar to thedefinition of“based” in C99, though it is slightly weaker.

LLVM IR does not associate types with memory. The result type of aload merely indicates the size and alignment of the memory fromwhich to load, as well as the interpretation of the value. The firstoperand type of astore similarly only indicates the size andalignment of the store.

Consequently, type-based alias analysis, aka TBAA, aka-fstrict-aliasing, is not applicable to general unadorned LLVM IR.Metadata may be used to encode additional informationwhich specialized optimization passes may use to implement type-basedalias analysis.

Pointer Capture

Given a function call and a pointer that is passed as an argument or stored inmemory before the call, the call may capture two components of the pointer:

  • The address of the pointer, which is its integral value. This also includesparts of the address or any information about the address, including thefact that it does not equal one specific value. We further distinguishwhether only the fact that the address is/isn’t null is captured.

  • The provenance of the pointer, which is the ability to perform memoryaccesses through the pointer, in the sense of thepointer aliasingrules. We further distinguish whether only read accessesare allowed, or both reads and writes.

For example, the following function captures the address of%a, becauseit is compared to a pointer, leaking information about the identity of thepointer:

@glb=globali80definei1@f(ptr%a){%c=icmpeqptr%a,@glbreti1%c}

The function does not capture the provenance of the pointer, because theicmp instruction only operates on the pointer address. The followingfunction captures both the address and provenance of the pointer, as bothmay be read from@glb after the function returns:

@glb=globalptrnulldefinevoid@f(ptr%a){storeptr%a,ptr@glbretvoid}

The following function capturesneither the address nor the provenance ofthe pointer:

definei32@f(ptr%a){%v=loadi32,ptr%areti32}

While address capture includes uses of the address within the body of thefunction, provenance capture refers exclusively to the ability to performaccessesafter the function returns. Memory accesses within the functionitself are not considered pointer captures.

We can further say that the capture only occurs through a specific location.In the following example, the pointer (both address and provenance) is capturedthrough the return value only:

defineptr@f(ptr%a){%gep=getelementptri8,ptr%a,i644retptr%gep}

However, we always consider direct inspection of the pointer address(e.g. usingptrtoint) to be location-independent. The following exampleisnot considered a return-only capture, even though theptrtointultimately only contributes to the return value:

@lookup=constant[4xi8][i80,i81,i82,i83]defineptr@f(ptr%a){%a.addr=ptrtointptr%atoi64%mask=andi64%a.addr,3%gep=getelementptri8,ptr@lookup,i64%maskretptr%gep}

This definition is chosen to allow capture analysis to continue with the returnvalue in the usual fashion.

The following describes possible ways to capture a pointer in more detail,where unqualified uses of the word “capture” refer to capturing both addressand provenance.

  1. The call stores any bit of the pointer carrying information into a place,and the stored bits can be read from the place by the caller after this callexits.

@glb=globalptrnull@glb2=globalptrnull@glb3=globalptrnull@glbi=globali320defineptr@f(ptr%a,ptr%b,ptr%c,ptr%d,ptr%e){storeptr%a,ptr@glb; %a is captured by this callstoreptr%b,ptr@glb2; %b isn't captured because the stored value is overwritten by the store belowstoreptrnull,ptr@glb2storeptr%c,ptr@glb3callvoid@g(); If @g makes a copy of %c that outlives this call (@f), %c is capturedstoreptrnull,ptr@glb3%i=ptrtointptr%dtoi64%j=trunci64%itoi32storei32%j,ptr@glbi; %d is capturedretptr%e; %e is captured}
  1. The call stores any bit of the pointer carrying information into a place,and the stored bits can be safely read from the place by another thread viasynchronization.

@lock=globali1truedefinevoid@f(ptr%a){storeptr%a,ptr@glbstoreatomici1false,ptr@lockrelease; %a is captured because another thread can safely read @glbstoreptrnull,ptr@glbretvoid}
  1. The call’s behavior depends on any bit of the pointer carrying information(address capture only).

@glb=globali80definevoid@f(ptr%a){%c=icmpeqptr%a,@glbbri1%c,label%BB_EXIT,label%BB_CONTINUE; captures address of %a onlyBB_EXIT:callvoid@exit()unreachableBB_CONTINUE:retvoid}
  1. The pointer is used as the pointer operand of a volatile access.

Volatile Memory Accesses

Certain memory accesses, such asload’s,store’s, andllvm.memcpy’s may bemarkedvolatile. The optimizers must not change the number ofvolatile operations or change their order of execution relative to othervolatile operations. The optimizersmay change the order of volatileoperations relative to non-volatile operations. This is not Java’s“volatile” and has no cross-thread synchronization behavior.

A volatile load or store may have additional target-specific semantics.Any volatile operation can have side effects, and any volatile operationcan read and/or modify state which is not accessible via a regular loador store in this module. Volatile operations may use addresses which donot point to memory (like MMIO registers). This means the compiler maynot use a volatile operation to prove a non-volatile access to thataddress has defined behavior. This includes addresses typically forbidden,such as the pointer with bit-value 0.

The allowed side-effects for volatile accesses are limited. If anon-volatile store to a given address would be legal, a volatileoperation may modify the memory at that address. A volatile operationmay not modify any other memory accessible by the module being compiled.A volatile operation may not call any code in the current module.

In general (without target specific context), the address space of avolatile operation may not be changed. Different address spaces mayhave different trapping behavior when dereferencing an invalidpointer.

The compiler may assume execution will continue after a volatile operation,so operations which modify memory or may have undefined behavior can behoisted past a volatile operation.

As an exception to the preceding rule, the compiler may not assume executionwill continue after a volatile store operation. This restriction is necessaryto support the somewhat common pattern in C of intentionally storing to aninvalid pointer to crash the program. In the future, it might make sense toallow frontends to control this behavior.

IR-level volatile loads and stores cannot safely be optimized into llvm.memcpyor llvm.memmove intrinsics even when those intrinsics are flagged volatile.Likewise, the backend should never split or merge target-legal volatileload/store instructions. Similarly, IR-level volatile loads and stores cannotchange from integer to floating-point or vice versa.

Rationale

Platforms may rely on volatile loads and stores of natively supporteddata width to be executed as single instruction. For example, in Cthis holds for an l-value of volatile primitive type with nativehardware support, but not necessarily for aggregate types. Thefrontend upholds these expectations, which are intentionallyunspecified in the IR. The rules above ensure that IR transformationsdo not violate the frontend’s contract with the language.

Memory Model for Concurrent Operations

The LLVM IR does not define any way to start parallel threads ofexecution or to register signal handlers. Nonetheless, there areplatform-specific ways to create them, and we define LLVM IR’s behaviorin their presence. This model is inspired by the C++ memory model.

For a more informal introduction to this model, see theLLVM Atomic Instructions and Concurrency Guide.

We define ahappens-before partial order as the least partial orderthat

  • Is a superset of single-thread program order, and

  • Whenasynchronizes-withb, includes an edge froma tob.Synchronizes-with pairs are introduced by platform-specifictechniques, like pthread locks, thread creation, thread joining,etc., and by atomic instructions. (See alsoAtomic Memory OrderingConstraints).

Note that program order does not introducehappens-before edgesbetween a thread and signals executing inside that thread.

Every (defined) read operation (load instructions, memcpy, atomicloads/read-modify-writes, etc.) R reads a series of bytes written by(defined) write operations (store instructions, atomicstores/read-modify-writes, memcpy, etc.). For the purposes of thissection, initialized globals are considered to have a write of theinitializer which is atomic and happens before any other read or writeof the memory in question. For each byte of a read R, Rbytemay see any write to the same byte, except:

  • If write1 happens before write2, andwrite2 happens before Rbyte, thenRbyte does not see write1.

  • If Rbyte happens before write3, thenRbyte does not see write3.

Given that definition, Rbyte is defined as follows:

  • If R is volatile, the result is target-dependent. (Volatile issupposed to give guarantees which can supportsig_atomic_t inC/C++, and may be used for accesses to addresses that do not behavelike normal memory. It does not generally provide cross-threadsynchronization.)

  • Otherwise, if there is no write to the same byte that happens beforeRbyte, Rbyte returnsundef for that byte.

  • Otherwise, if Rbyte may see exactly one write,Rbyte returns the value written by that write.

  • Otherwise, if R is atomic, and all the writes Rbyte maysee are atomic, it chooses one of the values written. See theAtomicMemory Ordering Constraints section for additionalconstraints on how the choice is made.

  • Otherwise Rbyte returnsundef.

R returns the value composed of the series of bytes it read. Thisimplies that some bytes within the value may beundefwithoutthe entire value beingundef. Note that this only defines thesemantics of the operation; it doesn’t mean that targets will emit morethan one instruction to read the series of bytes.

Note that in cases where none of the atomic intrinsics are used, thismodel places only one restriction on IR transformations on top of whatis required for single-threaded execution: introducing a store to a bytewhich might not otherwise be stored is not allowed in general.(Specifically, in the case where another thread might write to and readfrom an address, introducing a store can change a load that may seeexactly one write into a load that may see multiple writes.)

Atomic Memory Ordering Constraints

Atomic instructions (cmpxchg,atomicrmw,fence,atomic load, andatomic store) takeordering parameters that determine which other atomic instructions onthe same address theysynchronize with. These semantics implementthe Java or C++ memory models; if these descriptions aren’t preciseenough, check those specs (see spec references in theatomics guide).fence instructionstreat these orderings somewhat differently since they don’t take anaddress. See that instruction’s documentation for details.

For a simpler introduction to the ordering constraints, see theLLVM Atomic Instructions and Concurrency Guide.

unordered

The set of values that can be read is governed by the happens-beforepartial order. A value cannot be read unless some operation wroteit. This is intended to provide a guarantee strong enough to modelJava’s non-volatile shared variables. This ordering cannot bespecified for read-modify-write operations; it is not strong enoughto make them atomic in any interesting way.

monotonic

In addition to the guarantees ofunordered, there is a singletotal order for modifications bymonotonic operations on eachaddress. All modification orders must be compatible with thehappens-before order. There is no guarantee that the modificationorders can be combined to a global total order for the whole program(and this often will not be possible). The read in an atomicread-modify-write operation (cmpxchg andatomicrmw) reads the value in the modificationorder immediately before the value it writes. If one atomic readhappens before another atomic read of the same address, the laterread must see the same value or a later value in the address’smodification order. This disallows reordering ofmonotonic (orstronger) operations on the same address. If an address is writtenmonotonic-ally by one thread, and other threadsmonotonic-allyread that address repeatedly, the other threads must eventually seethe write. This corresponds to the C/C++memory_order_relaxed.

acquire

In addition to the guarantees ofmonotonic, asynchronizes-with edge may be formed with arelease operation.This is intended to model C/C++’smemory_order_acquire.

release

In addition to the guarantees ofmonotonic, if this operationwrites a value which is subsequently read by anacquireoperation, itsynchronizes-with that operation. Furthermore,this occurs even if the value written by arelease operationhas been modified by a read-modify-write operation before beingread. (Such a set of operations comprises areleasesequence). This corresponds to the C/C++memory_order_release.

acq_rel (acquire+release)

Acts as both anacquire andrelease operation on itsaddress. This corresponds to the C/C++memory_order_acq_rel.

seq_cst (sequentially consistent)

In addition to the guarantees ofacq_rel (acquire for anoperation that only reads,release for an operation that onlywrites), there is a global total order on allsequentially-consistent operations on all addresses. Eachsequentially-consistent read sees the last preceding write to thesame address in this global order. This corresponds to the C/C++memory_order_seq_cst and Javavolatile.

Note: this global total order isnot guaranteed to be fullyconsistent with thehappens-before partial order ifnon-seq_cst accesses are involved. See the C++ standard[atomics.order] sectionfor more details on the exact guarantees.

If an atomic operation is markedsyncscope("singlethread"), it onlysynchronizes with and only participates in the seq_cst total orderings ofother operations running in the same thread (for example, in signal handlers).

If an atomic operation is markedsyncscope("<target-scope>"), where<target-scope> is a target specific synchronization scope, then it is targetdependent if itsynchronizes with and participates in the seq_cst totalorderings of other operations.

Otherwise, an atomic operation that is not markedsyncscope("singlethread")orsyncscope("<target-scope>")synchronizes with and participates in theseq_cst total orderings of other operations that are not markedsyncscope("singlethread") orsyncscope("<target-scope>").

Floating-Point Environment

The default LLVM floating-point environment assumes that traps are disabled andstatus flags are not observable. Therefore, floating-point math operations donot have side effects and may be speculated freely. Results assume theround-to-nearest rounding mode, and subnormals are assumed to be preserved.

Running LLVM code in an environment where these assumptions are not mettypically leads to undefined behavior. Thestrictfp anddenormal-fp-mathattributes as well asConstrained Floating-Point Intrinsics can be used to weaken LLVM’s assumptions and ensure definedbehavior in non-default floating-point environments; see their respectivedocumentation for details.

Behavior of Floating-Point NaN values

A floating-point NaN value consists of a sign bit, a quiet/signaling bit, and apayload (which makes up the rest of the mantissa except for the quiet/signalingbit). LLVM assumes that the quiet/signaling bit being set to1 indicates aquiet NaN (QNaN), and a value of0 indicates a signaling NaN (SNaN). In thefollowing we will hence just call it the “quiet bit”.

The representation bits of a floating-point value do not mutate arbitrarily; inparticular, if there is no floating-point operation being performed, NaN signs,quiet bits, and payloads are preserved.

For the purpose of this section,bitcast as well as the following operationsare not “floating-point math operations”:fneg,llvm.fabs, andllvm.copysign. These operations act directly on the underlying bitrepresentation and never change anything except possibly for the sign bit.

Floating-point math operations that return a NaN are an exception from thegeneral principle that LLVM implements IEEE-754 semantics. Unless specifiedotherwise, the following rules apply whenever the IEEE-754 semantics say that aNaN value is returned: the result has a non-deterministic sign; the quiet bitand payload are non-deterministically chosen from the following set of options:

  • The quiet bit is set and the payload is all-zero. (“Preferred NaN” case)

  • The quiet bit is set and the payload is copied from any input operand that isa NaN. (“Quieting NaN propagation” case)

  • The quiet bit and payload are copied from any input operand that is a NaN.(“Unchanged NaN propagation” case)

  • The quiet bit is set and the payload is picked from a target-specific set of“extra” possible NaN payloads. The set can depend on the input operand values.This set is empty on x86 and ARM, but can be non-empty on other architectures.(For instance, on wasm, if any input NaN does not have the preferred all-zeropayload or any input NaN is an SNaN, then this set contains all possiblepayloads; otherwise, it is empty. On SPARC, this set consists of the all-onepayload.)

In particular, if all input NaNs are quiet (or if there are no input NaNs), thenthe output NaN is definitely quiet. Signaling NaN outputs can only occur if theyare provided as an input value. For example, “fmul SNaN, 1.0” may be simplifiedto SNaN rather than QNaN. Similarly, if all input NaNs are preferred (or ifthere are no input NaNs) and the target does not have any “extra” NaN payloads,then the output NaN is guaranteed to be preferred.

Floating-point math operations are allowed to treat all NaNs as if they werequiet NaNs. For example, “pow(1.0, SNaN)” may be simplified to 1.0.

Code that requires different behavior than this should use theConstrained Floating-Point Intrinsics.In particular, constrained intrinsics rule out the “Unchanged NaN propagation”case; they are guaranteed to return a QNaN.

Unfortunately, due to hard-or-impossible-to-fix issues, LLVM violates its ownspecification on some architectures:

  • x86-32 without SSE2 enabled may convert floating-point values to x86_fp80 andback when performing floating-point math operations; this can lead to resultswith different precision than expected and it can alter NaN values. Sinceoptimizations can make contradicting assumptions, this can lead to arbitrarymiscompilations. Seeissue #44218.

  • x86-32 (even with SSE2 enabled) may implicitly perform such a conversion onvalues returned from a function for some calling conventions. Seeissue#66803.

  • Older MIPS versions use the opposite polarity for the quiet/signaling bit, andLLVM does not correctly represent this. Seeissue #60796.

Floating-Point Semantics

This section defines the semantics for core floating-point operations on typesthat use a format specified by IEEE-745. These types are:half,float,double, andfp128, which correspond to the binary16, binary32, binary64,and binary128 formats, respectively. The “core” operations are those defined insection 5 of IEEE-745, which all have corresponding LLVM operations.

The value returned by those operations matches that of the correspondingIEEE-754 operation executed in thedefault LLVM floating-point environment, except that the behavior of NaN results is insteadasspecified here. In particular, such a floating-point instructionreturning a non-NaN value is guaranteed to always return the same bit-identicalresult on all machines and optimization levels.

This means that optimizations and backends may not change the observed bitwiseresult of these operations in any way (unless NaNs are returned), and frontendscan rely on these operations providing correctly rounded results as described inthe standard.

(Note that this is only about the value returned by these operations; see thefloating-point environment section regarding flags andexceptions.)

Various flags, attributes, and metadata can alter the behavior of theseoperations and thus make them not bit-identical across machines and optimizationlevels any more: most notably, thefast-math flags as well asthestrictfp anddenormal-fp-mathattributes andfpmath metadata <fpmath-metadata>. See theircorresponding documentation for details.

Fast-Math Flags

LLVM IR floating-point operations (fneg,fadd,fsub,fmul,fdiv,frem,fcmp,fptrunc,fpext), andphi,select, orcall instructions that return floating-point types may use thefollowing flags to enable otherwise unsafe floating-point transformations.

fast

This flag is a shorthand for specifying all fast-math flags at once, andimparts no additional semantics from using all of them.

nnan

No NaNs - Allow optimizations to assume the arguments and result are notNaN. If an argument is a nan, or the result would be a nan, it producesapoison value instead.

ninf

No Infs - Allow optimizations to assume the arguments and result are not+/-Inf. If an argument is +/-Inf, or the result would be +/-Inf, itproduces apoison value instead.

nsz

No Signed Zeros - Allow optimizations to treat the sign of a zeroargument or zero result as insignificant. This does not imply that -0.0is poison and/or guaranteed to not exist in the operation.

Note: Forphi,select, andcallinstructions, the following return types are considered to be floating-pointtypes:

  • Floating-point scalar or vector types

  • Array types (nested to any depth) of floating-point scalar or vector types

  • Homogeneous literal struct types of floating-point scalar or vector types

Rewrite-based flags

The following flags have rewrite-based semantics. These flags allow expressions,potentially containing multiple non-consecutive instructions, to be rewritteninto alternative instructions. When multiple instructions are involved in anexpression, it is necessary that all of the instructions have the necessaryrewrite-based flag present on them, and the rewritten instructions willgenerally have the intersection of the flags present on the input instruction.

In the following example, the floating-point expression in the body of@orighascontract andreassoc in common, and thus if it is rewritten into theexpression in the body of@target, all of the new instructions get those twoflags and only those flags as a result. Since thearcp is present on onlyone of the instructions in the expression, it is not present in the transformedexpression. Furthermore, this reassociation here is only legal because both theinstructions had thereassoc flag; if only one had it, it would not be legalto make the transformation.

definedouble@orig(double%a,double%b,double%c){%t1=fmulcontractreassocdouble%a,%b%val=fmulcontractreassocarcpdouble%t1,%cretdouble%val}definedouble@target(double%a,double%b,double%c){%t1=fmulcontractreassocdouble%b,%c%val=fmulcontractreassocdouble%a,%t1retdouble%val}

These rules do not apply to the other fast-math flags. Whether or not a flaglikennan is present on any or all of the rewritten instructions is basedon whether or not it is possible for said instruction to have a NaN input oroutput, given the original flags.

arcp

Allows division to be treated as a multiplication by a reciprocal.Specifically, this permitsa/b to be considered equivalent toa*(1.0/b) (which may subsequently be susceptible to code motion),and it also permitsa/(b/c) to be considered equivalent toa*(c/b). Both of these rewrites can be applied in either direction:a*(c/b) can be rewritten intoa/(b/c).

contract

Allow floating-point contraction (e.g. fusing a multiply followed by anaddition into a fused multiply-and-add). This does not enable reassociationto form arbitrary contractions. For example,(a*b)+(c*d)+e can notbe transformed into(a*b)+((c*d)+e) to create two fma operations.

afn

Approximate functions - Allow substitution of approximate calculations forfunctions (sin, log, sqrt, etc). See floating-point intrinsic definitionsfor places where this can apply to LLVM’s intrinsic math functions.

reassoc

Allow algebraically equivalent transformations for floating-pointinstructions such as reassociation transformations. This may dramaticallychange results in floating-point.

Use-list Order Directives

Use-list directives encode the in-memory order of each use-list, allowing theorder to be recreated.<order-indexes> is a comma-separated list ofindexes that are assigned to the referenced value’s uses. The referencedvalue’s use-list is immediately sorted by these indexes.

Use-list directives may appear at function scope or global scope. They are notinstructions, and have no effect on the semantics of the IR. When they’re atfunction scope, they must appear after the terminator of the final basic block.

If basic blocks have their address taken viablockaddress() expressions,uselistorder_bb can be used to reorder their use-lists from outside theirfunction’s scope.

Syntax:

uselistorder<ty><value>,{<order-indexes>}uselistorder_bb@function,%block{<order-indexes>}
Examples:

definevoid@foo(i32%arg1,i32%arg2){entry:;...instructions...bb:;...instructions...;Atfunctionscope.uselistorderi32%arg1,{1,0,2}uselistorderlabel%bb,{1,0}};Atglobalscope.uselistorderptr@global,{1,2,0}uselistorderi327,{1,0}uselistorderi32(i32)@bar,{1,0}uselistorder_bb@foo,%bb,{5,1,3,2,0,4}

Source Filename

Thesource filename string is set to the original module identifier,which will be the name of the compiled source file when compiling fromsource through the clang front end, for example. It is then preserved throughthe IR and bitcode.

This is currently necessary to generate a consistent unique globalidentifier for local functions used in profile data, which prepends thesource file name to the local function name.

The syntax for the source file name is simply:

source_filename = "/path/to/source.c"

Type System

The LLVM type system is one of the most important features of theintermediate representation. Being typed enables a number ofoptimizations to be performed on the intermediate representationdirectly, without having to do extra analyses on the side before thetransformation. A strong type system makes it easier to read thegenerated code and enables novel analyses and transformations that arenot feasible to perform on normal three address code representations.

Void Type

Overview:

The void type does not represent any value and has no size.

Syntax:

void

Function Type

Overview:

The function type can be thought of as a function signature. It consists of areturn type and a list of formal parameter types. The return type of a functiontype is a void type or first class type — except forlabelandmetadata types.

Syntax:

<returntype>(<parameterlist>)

…where ‘<parameterlist>’ is a comma-separated list of typespecifiers. Optionally, the parameter list may include a type..., whichindicates that the function takes a variable number of arguments. Variableargument functions can access their arguments with thevariable argumenthandling intrinsic functions. ‘<returntype>’ is any typeexceptlabel andmetadata.

Examples:

i32(i32)

function taking ani32, returning ani32

i32(ptr,...)

A vararg function that takes at least onepointer argument and returns an integer. This is the signature forprintf in LLVM.

{i32,i32}(i32)

A function taking ani32, returning astructure containing twoi32 values

Opaque Structure Types

Overview:

Opaque structure types are used to represent structure types thatdo not have a body specified. This corresponds (for example) to the Cnotion of a forward declared structure. They can be named (%X) orunnamed (%52).

It is not possible to create SSA values with an opaque structure type. Inpractice, this largely limits their use to the value type of external globals.

Syntax:

%X=typeopaque%52=typeopaque@g=externalglobal%X

First Class Types

Thefirst class types are perhaps the most important.Values of these types are the only ones which can be produced byinstructions.

Single Value Types

These are the types that are valid in registers from CodeGen’s perspective.

Integer Type
Overview:

The integer type is a very simple type that simply specifies anarbitrary bit width for the integer type desired. Any bit width from 1bit to 223(about 8 million) can be specified.

Syntax:

iN

The number of bits the integer will occupy is specified by theNvalue.

Examples:

i1

a single-bit integer.

i32

a 32-bit integer.

i1942652

a really big integer of over 1 million bits.

Floating-Point Types

Type

Description

half

16-bit floating-point value (IEEE-754 binary16)

bfloat

16-bit “brain” floating-point value (7-bit significand). Provides thesame number of exponent bits asfloat, so that it matches its dynamicrange, but with greatly reduced precision. Used in Intel’s AVX-512 BF16extensions and Arm’s ARMv8.6-A extensions, among others.

float

32-bit floating-point value (IEEE-754 binary32)

double

64-bit floating-point value (IEEE-754 binary64)

fp128

128-bit floating-point value (IEEE-754 binary128)

x86_fp80

80-bit floating-point value (X87)

ppc_fp128

128-bit floating-point value (two 64-bits)

X86_amx Type
Overview:

The x86_amx type represents a value held in an AMX tile register on an x86machine. The operations allowed on it are quite limited. Only few intrinsicsare allowed: stride load and store, zero and dot product. No instruction isallowed for this type. There are no arguments, arrays, pointers, vectorsor constants of this type.

Syntax:

x86_amx
Pointer Type
Overview:

The pointer typeptr is used to specify memory locations. Pointers arecommonly used to reference objects in memory.

Pointer types may have an optional address space attribute definingthe numbered address space where the pointed-to object resides. Forexample,ptraddrspace(5) is a pointer to address space 5.In addition to integer constants,addrspace can also reference one of theaddress spaces defined in thedatalayout string.addrspace("A") will use the alloca address space,addrspace("G")the default globals address space andaddrspace("P") the program addressspace.

The representation of pointers can be different for each address space and doesnot necessarily need to be a plain integer address (e.g. fornon-integral pointers). In addition to a representationbits size, pointers in each address space also have an index size which definesthe bitwidth of indexing operations as well as the size ofinteger addressesin this address space. For example, CHERI capabilities are twice the size of theunderlying addresses to accommodate for additional metadata such as bounds andpermissions: on a 32-bit system the bitwidth of the pointer representation sizeis 64, but the underlying address width remains 32 bits.

The default address space is number zero.

The semantics of non-zero address spaces are target-specific. Memoryaccess through a non-dereferenceable pointer is undefined behavior inany address space. Pointers with the bit-value 0 are only assumed tobe non-dereferenceable in address space 0, unless the function ismarked with thenull_pointer_is_valid attribute. However,volatileaccess to any non-dereferenceable address may have defined behavior(according to the target), and in this case the attribute is not neededeven for address 0.

If an object can be proven accessible through a pointer with adifferent address space, the access may be modified to use thataddress space. Exceptions apply if the operation isvolatile.

Prior to LLVM 15, pointer types also specified a pointee type, such asi8*,[4xi32]* ori32(i32*)*. In LLVM 15, such “typedpointers” are still supported under non-default options. See theopaque pointers document for more information.

Target Extension Type
Overview:

Target extension types represent types that must be preserved throughoptimization, but are otherwise generally opaque to the compiler. They may beused as function parameters or arguments, and inphi orselect instructions. Some types may be also used inalloca instructions or as global values, and correspondinglyit is legal to useload andstore instructionson them. Full semantics for these types are defined by the target.

The only constants that target extension types may have arezeroinitializer,undef, andpoison. Other possible values for target extension types mayarise from target-specific intrinsics and functions.

These types cannot be converted to other types. As such, it is not legal to usethem inbitcast instructions (as a source or target type),nor is it legal to use them inptrtoint orinttoptr instructions. Similarly, they are not legal to usein anicmp instruction.

Target extension types have a name and optional type or integer parameters. Themeanings of name and parameters are defined by the target. When being defined inLLVM IR, all of the type parameters must precede all of the integer parameters.

Specific target extension types are registered with LLVM as having specificproperties. These properties can be used to restrict the type from appearing incertain contexts, such as being the type of a global variable or having azeroinitializer constant be valid. A complete list of type properties may befound in the documentation forllvm::TargetExtType::Property (doxygen).

Syntax:

target("label")target("label",void)target("label",void,i32)target("label",0,1,2)target("label",void,i32,0,1,2)
Vector Type
Overview:

A vector type is a simple derived type that represents a vector ofelements. Vector types are used when multiple primitive data areoperated in parallel using a single instruction (SIMD). A vector typerequires a size (number of elements), an underlying primitive data type,and a scalable property to represent vectors where the exact hardwarevector length is unknown at compile time. Vector types are consideredfirst class.

Memory Layout:

In general vector elements are laid out in memory in the same way asarray types. Such an analogy works fine as long as the vectorelements are byte sized. However, when the elements of the vector aren’t bytesized it gets a bit more complicated. One way to describe the layout is bydescribing what happens when a vector such as <N x iM> is bitcasted to aninteger type with N*M bits, and then following the rules for storing such aninteger to memory.

A bitcast from a vector type to a scalar integer type will see the elementsbeing packed together (without padding). The order in which elements areinserted in the integer depends on endianness. For little endian element zerois put in the least significant bits of the integer, and for big endianelement zero is put in the most significant bits.

Using a vector such as<i41,i42,i43,i45> as an example, togetherwith the analogy that we can replace a vector store by a bitcast followed byan integer store, we get this for big endian:

%val=bitcast<4xi4><i41,i42,i43,i45>toi16; Bitcasting from a vector to an integral type can be seen as; concatenating the values:;   %val now has the hexadecimal value 0x1235.storei16%val,ptr%ptr; In memory the content will be (8-bit addressing):;;    [%ptr + 0]: 00010010  (0x12);    [%ptr + 1]: 00110101  (0x35)

The same example for little endian:

%val=bitcast<4xi4><i41,i42,i43,i45>toi16; Bitcasting from a vector to an integral type can be seen as; concatenating the values:;   %val now has the hexadecimal value 0x5321.storei16%val,ptr%ptr; In memory the content will be (8-bit addressing):;;    [%ptr + 0]: 00100001  (0x21);    [%ptr + 1]: 01010011  (0x53)

When<N*M> isn’t evenly divisible by the byte size the exact memory layoutis unspecified (just like it is for an integral type of the same size). Thisis because different targets could put the padding at different positions whenthe type size is smaller than the type’s store size.

Syntax:

<<# elements> x <elementtype> >          ; Fixed-length vector<vscalex<# elements> x <elementtype> > ; Scalable vector

The number of elements is a constant integer value larger than 0;elementtype may be any integer, floating-point, pointer type, or a sizedtarget extension type that has theCanBeVectorElement property. Vectorsof size zero are not allowed. For scalable vectors, the total number ofelements is a constant multiple (called vscale) of the specified numberof elements; vscale is a positive integer that is unknown at compile timeand the same hardware-dependent constant for all scalable vectors at runtime. The size of a specific scalable vector type is thus constant withinIR, even if the exact size in bytes cannot be determined until run time.

Examples:

<4xi32>

Vector of 4 32-bit integer values.

<8xfloat>

Vector of 8 32-bit floating-point values.

<2xi64>

Vector of 2 64-bit integer values.

<4xptr>

Vector of 4 pointers

<vscalex4xi32>

Vector with a multiple of 4 32-bit integer values.

Label Type

Overview:

The label type represents code labels.

Syntax:

label

Token Type

Overview:

The token type is used when a value is associated with an instructionbut all uses of the value must not attempt to introspect or obscure it.As such, it is not appropriate to have aphi orselect of type token.

Syntax:

token

Metadata Type

Overview:

The metadata type represents embedded metadata. No derived types may becreated from metadata except forfunction arguments.

Syntax:

metadata

Aggregate Types

Aggregate Types are a subset of derived types that can contain multiplemember types.Arrays andstructs areaggregate types.Vectors are not considered to beaggregate types.

Array Type
Overview:

The array type is a very simple derived type that arranges elementssequentially in memory. The array type requires a size (number ofelements) and an underlying data type.

Syntax:

[<# elements> x <elementtype>]

The number of elements is a constant integer value;elementtype maybe any type with a size.

Examples:

[40xi32]

Array of 40 32-bit integer values.

[41xi32]

Array of 41 32-bit integer values.

[4xi8]

Array of 4 8-bit integer values.

Here are some examples of multidimensional arrays:

[3x[4xi32]]

3x4 array of 32-bit integer values.

[12x[10xfloat]]

12x10 array of single precision floating-point values.

[2x[3x[4xi16]]]

2x3x4 array of 16-bit integer values.

There is no restriction on indexing beyond the end of the array impliedby a static type (though there are restrictions on indexing beyond thebounds of anallocated object in some cases). Thismeans that single-dimension ‘variable sized array’ addressing can be implementedin LLVM with a zero length array type. An implementation of ‘pascal stylearrays’ in LLVM could use the type “{i32,[0xfloat]}”, for example.

Structure Type
Overview:

The structure type is used to represent a collection of data memberstogether in memory. The elements of a structure may be any type that hasa size.

Structures in memory are accessed using ‘load’ and ‘store’ bygetting a pointer to a field with the ‘getelementptr’ instruction.Structures in registers are accessed using the ‘extractvalue’ and‘insertvalue’ instructions.

Structures may optionally be “packed” structures, which indicate thatthe alignment of the struct is one byte, and that there is no paddingbetween the elements. In non-packed structs, padding between field typesis inserted as defined by the DataLayout string in the module, which isrequired to match what the underlying code generator expects.

Structures can either be “literal” or “identified”. A literal structureis defined inline with other types (e.g.[2x{i32,i32}]) whereasidentified types are always defined at the top level with a name.Literal types are uniqued by their contents and can never be recursiveor opaque since there is no way to write one. Identified types can beopaqued and are never uniqued. Identified types must not be recursive.

Syntax:

%T1=type{<typelist>};Identifiednormalstructtype%T2=type<{<typelist>}>;Identifiedpackedstructtype
Examples:

{i32,i32,i32}

A triple of threei32 values (this is a “homogeneous” struct as all element types are the same)

{float,ptr}

A pair, where the first element is afloat and the second element is apointer.

<{i8,i32}>

A packed struct known to be 5 bytes in size.

Constants

LLVM has several different basic types of constants. This sectiondescribes them all and their syntax.

Simple Constants

Boolean constants

The two strings ‘true’ and ‘false’ are both valid constantsof thei1 type.

Integer constants

Standard integers (such as ‘4’) are constants of theinteger type. They can be either decimal orhexadecimal. Decimal integers can be prefixed with - to representnegative integers, e.g. ‘-1234’. Hexadecimal integers must beprefixed with either u or s to indicate whether they are unsignedor signed respectively. e.g ‘u0x8000’ gives 32768, whilst‘s0x8000’ gives -32768.

Note that hexadecimal integers are sign extended from the numberof active bits, i.e. the bit width minus the number of leadingzeros. So ‘s0x0001’ of type ‘i16’ will be -1, not 1.

Floating-point constants

Floating-point constants use standard decimal notation (e.g.123.421), exponential notation (e.g. 1.23421e+2), or a more precisehexadecimal notation (see below). The assembler requires the exactdecimal value of a floating-point constant. For example, theassembler accepts 1.25 but rejects 1.3 because 1.3 is a repeatingdecimal in binary. Floating-point constants must have afloating-point type.

Null pointer constants

The identifier ‘null’ is recognized as a null pointer constantand must be ofpointer type.

Token constants

The identifier ‘none’ is recognized as an empty token constantand must be oftoken type.

The one non-intuitive notation for constants is the hexadecimal form offloating-point constants. For example, the form‘double   0x432ff973cafa8000’ is equivalent to (but harder to readthan) ‘double4.5e+15’. The only time hexadecimal floating-pointconstants are required (and the only time that they are generated by thedisassembler) is when a floating-point constant must be emitted but itcannot be represented as a decimal floating-point number in a reasonablenumber of digits. For example, NaN’s, infinities, and other specialvalues are represented in their IEEE hexadecimal format so that assemblyand disassembly do not cause any bits to change in the constants.

When using the hexadecimal form, constants of types bfloat, half, float, anddouble are represented using the 16-digit form shown above (which matches theIEEE754 representation for double); bfloat, half and float values must, however,be exactly representable as bfloat, IEEE 754 half, and IEEE 754 singleprecision respectively. Hexadecimal format is always used for long double, andthere are three forms of long double. The 80-bit format used by x86 isrepresented as0xK followed by 20 hexadecimal digits. The 128-bit formatused by PowerPC (two adjacent doubles) is represented by0xM followed by 32hexadecimal digits. The IEEE 128-bit format is represented by0xL followedby 32 hexadecimal digits. Long doubles will only work if they match the longdouble format on your target. The IEEE 16-bit format (half precision) isrepresented by0xH followed by 4 hexadecimal digits. The bfloat 16-bitformat is represented by0xR followed by 4 hexadecimal digits. Allhexadecimal formats are big-endian (sign bit at the left).

There are no constants of type x86_amx.

Complex Constants

Complex constants are a (potentially recursive) combination of simpleconstants and smaller complex constants.

Structure constants

Structure constants are represented with notation similar tostructure type definitions (a comma separated list of elements,surrounded by braces ({})). For example:“{i324,float17.0,ptr@G}”, where “@G” is declared as“@G=externalglobali32”. Structure constants must havestructure type, and the number and types of elementsmust match those specified by the type.

Array constants

Array constants are represented with notation similar to array typedefinitions (a comma separated list of elements, surrounded bysquare brackets ([])). For example:“[i3242,i3211,i3274]”. Array constants must havearray type, and the number and types of elements mustmatch those specified by the type. As a special case, character arrayconstants may also be represented as a double-quoted string using thecprefix. For example: “c"HelloWorld\0A\00"”.

Vector constants

Vector constants are represented with notation similar to vectortype definitions (a comma separated list of elements, surrounded byless-than/greater-than’s (<>)). For example:“<i3242,i3211,i3274,i32100>”. Vector constantsmust havevector type, and the number and types ofelements must match those specified by the type.

When creating a vector whose elements have the same constant value, thepreferred syntax issplat(<Ty>Val). For example: “splat(i3211)”.These vector constants must havevector type with anelement type that matches thesplat operand.

Zero initialization

The string ‘zeroinitializer’ can be used to zero initialize avalue to zero ofany type, including scalar andaggregate types. This is often used to avoidhaving to print large zero initializers (e.g. for large arrays) andis always exactly equivalent to using explicit zero initializers.

Metadata node

A metadata node is a constant tuple without types. For example:“!{!0,!{!2,!0},!"test"}”. Metadata can reference constant values,for example: “!{!0,i320,ptr@global,ptr@function,!"str"}”.Unlike other typed constants that are meant to be interpreted as part ofthe instruction stream, metadata is a place to attach additionalinformation such as debug info.

Global Variable and Function Addresses

The addresses ofglobal variables andfunctions are always implicitly valid(link-time) constants. These constants are explicitly referenced whentheidentifier for the global is used and always havepointer type. For example, the following is a legal LLVMfile:

@X=globali3217@Y=globali3242@Z=global[2xptr][ptr@X,ptr@Y]

Undefined Values

The string ‘undef’ can be used anywhere a constant is expected, andindicates that the user of the value may receive an unspecifiedbit-pattern. Undefined values may be of any type (other than ‘label’or ‘void’) and be used anywhere a constant is permitted.

Note

A ‘poison’ value (described in the next section) should be used instead of‘undef’ whenever possible. Poison values are stronger than undef, andenable more optimizations. Just the existence of ‘undef’ blocks certainoptimizations (see the examples below).

Undefined values are useful because they indicate to the compiler thatthe program is well defined no matter what value is used. This gives thecompiler more freedom to optimize. Here are some examples of(potentially surprising) transformations that are valid (in pseudo IR):

%A=add%X,undef%B=sub%X,undef%C=xor%X,undefSafe:%A=undef%B=undef%C=undef

This is safe because all of the output bits are affected by the undefbits. Any output bit can have a zero or one depending on the input bits.

%A=or%X,undef%B=and%X,undefSafe:%A=-1%B=0Safe:%A=%X;; By choosing undef as 0%B=%X;; By choosing undef as -1Unsafe:%A=undef%B=undef

These logical operations have bits that are not always affected by theinput. For example, if%X has a zero bit, then the output of the‘and’ operation will always be a zero for that bit, no matter whatthe corresponding bit from the ‘undef’ is. As such, it is unsafe tooptimize or assume that the result of the ‘and’ is ‘undef’.However, it is safe to assume that all bits of the ‘undef’ could be0, and optimize the ‘and’ to 0. Likewise, it is safe to assume thatall the bits of the ‘undef’ operand to the ‘or’ could be set,allowing the ‘or’ to be folded to -1.

%A=selectundef,%X,%Y%B=selectundef,42,%Y%C=select%X,%Y,undefSafe:%A=%X(or%Y)%B=42(or%Y)%C=%Y(if%Yisprovablynotpoison; unsafe otherwise)Unsafe:%A=undef%B=undef%C=undef

This set of examples shows that undefined ‘select’conditions can goeither way, but they have to come from oneof the two operands. In the%A example, if%X and%Y wereboth known to have a clear low bit, then%A would have to have acleared low bit. However, in the%C example, the optimizer isallowed to assume that the ‘undef’ operand could be the same as%Y if%Y is provably not ‘poison’, allowing the whole ‘select’to be eliminated. This is because ‘poison’ is stronger than ‘undef’.

%A=xorundef,undef%B=undef%C=xor%B,%B%D=undef%E=icmpslt%D,4%F=icmpsge%D,4Safe:%A=undef%B=undef%C=undef%D=undef%E=undef%F=undef

This example points out that two ‘undef’ operands are notnecessarily the same. This can be surprising to people (and also matchesC semantics) where they assume that “X^X” is always zero, even ifX is undefined. This isn’t true for a number of reasons, but theshort answer is that an ‘undef’ “variable” can arbitrarily changeits value over its “live range”. This is true because the variabledoesn’t actuallyhave a live range. Instead, the value is logicallyread from arbitrary registers that happen to be around when needed, sothe value is not necessarily consistent over time. In fact,%A and%C need to have the same semantics or the core LLVM “replace alluses with” concept would not hold.

To ensure all uses of a given register observe the same value (even if‘undef’), thefreeze instruction can be used.

%A=sdivundef,%X%B=sdiv%X,undefSafe:%A=0b:unreachable

These examples show the crucial difference between anundefined valueandundefined behavior. An undefined value (like ‘undef’) isallowed to have an arbitrary bit-pattern. This means that the%Aoperation can be constant folded to ‘0’, because the ‘undef’could be zero, and zero divided by any value is zero.However, in the second example, we can make a more aggressiveassumption: because theundef is allowed to be an arbitrary value,we are allowed to assume that it could be zero. Since a divide by zerohasundefined behavior, we are allowed to assume that the operationdoes not execute at all. This allows us to delete the divide and allcode after it. Because the undefined operation “can’t happen”, theoptimizer can assume that it occurs in dead code.

a:  store undef -> %Xb:  store %X -> undefSafe:a: <deleted>     (if the stored value in %X is provably not poison)b: unreachable

A storeof an undefined value can be assumed to not have any effect;we can assume that the value is overwritten with bits that happen tomatch what was already there. This argument is only valid if the stored valueis provably notpoison. However, a storeto an undefinedlocation could clobber arbitrary memory, therefore, it has undefinedbehavior.

Branching on an undefined value is undefined behavior.This explains optimizations that depend on branch conditions to constructpredicates, such as Correlated Value Propagation and Global Value Numbering.In case of switch instruction, the branch condition should be frozen, otherwiseit is undefined behavior.

Unsafe:brundef,BB1,BB2; UB%X=andi32undef,255switch%X,label%ret[..]; UBstoreundef,ptr%ptr%X=loadptr%ptr; %X is undefswitchi8%X,label%ret[..]; UBSafe:%X=ori8undef,255; always 255switchi8%X,label%ret[..]; Well-defined%X=freezei1undefbr%X,BB1,BB2; Well-defined (non-deterministic jump)

Poison Values

A poison value is a result of an erroneous operation.In order to facilitate speculative execution, many instructions do notinvoke immediate undefined behavior when provided with illegal operands,and return a poison value instead.The string ‘poison’ can be used anywhere a constant is expected, andoperations such asadd with thensw flag can producea poison value.

Most instructions return ‘poison’ when one of their arguments is‘poison’. A notable exception is theselect instruction.Propagation of poison can be stopped with thefreeze instruction.

It is correct to replace a poison value with anundef value or any value of the type.

This means that immediate undefined behavior occurs if a poison value isused as an instruction operand that has any values that trigger undefinedbehavior. Notably this includes (but is not limited to):

  • The pointer operand of aload,store orany other pointer dereferencing instruction (independent of addressspace).

  • The divisor operand of audiv,sdiv,urem orsreminstruction.

  • The condition operand of abr instruction.

  • The callee operand of acall orinvokeinstruction.

  • The parameter operand of acall orinvokeinstruction, when the function or invoking call site has anoundefattribute in the corresponding position.

  • The operand of aret instruction if the function or invokingcall site has anoundef attribute in the return value position.

Here are some examples:

entry:%poison=subnuwi320,1; Results in a poison value.%poison2=subi32poison,1; Also results in a poison value.%still_poison=andi32%poison,0; 0, but also poison.%poison_yet_again=getelementptri32,ptr@h,i32%still_poisonstorei320,ptr%poison_yet_again; Undefined behavior due to; store to poison.storei32%poison,ptr@g; Poison value stored to memory.%poison3=loadi32,ptr@g; Poison value loaded back from memory.%poison4=loadi16,ptr@g; Returns a poison value.%poison5=loadi64,ptr@g; Returns a poison value.%cmp=icmpslti32%poison,0; Returns a poison value.bri1%cmp,label%end,label%end; undefined behaviorend:

Well-Defined Values

Given a program execution, a value iswell defined if the value does nothave an undef bit and is not poison in the execution.An aggregate value or vector is well defined if its elements are well defined.The padding of an aggregate isn’t considered, since it isn’t visiblewithout storing it into memory and loading it with a different type.

A constant of asingle value, non-vector type is welldefined if it is neither ‘undef’ constant nor ‘poison’ constant.The result offreeze instruction is well defined regardlessof its operand.

Addresses of Basic Blocks

blockaddress(@function,%block)

The ‘blockaddress’ constant computes the address of the specifiedbasic block in the specified function.

It always has anptraddrspace(P) type, whereP is the address spaceof the function containing%block (usuallyaddrspace(0)).

Taking the address of the entry block is illegal.

This value only has defined behavior when used as an operand to the‘indirectbr’ or for comparisons against null. Pointerequality tests between labels addresses results in undefined behavior —though, again, comparison against null is ok, and no label is equal to the nullpointer. This may be passed around as an opaque pointer sized value as long asthe bits are not inspected. This allowsptrtoint and arithmetic to beperformed on these values so long as the original value is reconstituted beforetheindirectbr instruction.

Finally, some targets may provide defined semantics when using the valueas the operand to an inline assembly, but that is target specific.

DSO Local Equivalent

dso_local_equivalent@func

A ‘dso_local_equivalent’ constant represents a function which isfunctionally equivalent to a given function, but is always defined in thecurrent linkage unit. The resulting pointer has the same type as the underlyingfunction. The resulting pointer is permitted, but not required, to be differentfrom a pointer to the function, and it may have different values in differenttranslation units.

The target function may not haveextern_weak linkage.

dso_local_equivalent can be implemented as such:

  • If the function has local linkage, hidden visibility, or isdso_local,dso_local_equivalent can be implemented as simply a pointerto the function.

  • dso_local_equivalent can be implemented with a stub that tail-calls thefunction. Many targets support relocations that resolve at link time to eithera function or a stub for it, depending on if the function is defined within thelinkage unit; LLVM will use this when available. (This is commonly called a“PLT stub”.) On other targets, the stub may need to be emitted explicitly.

This can be used wherever adso_local instance of a function is needed withoutneeding to explicitly make the original functiondso_local. An instance wherethis can be used is for static offset calculations between a function and some otherdso_local symbol. This is especially useful for the Relative VTables C++ ABI,where dynamic relocations for function pointers in VTables can be replaced withstatic relocations for offsets between the VTable and virtual functions whichmay not bedso_local.

This is currently only supported for ELF binary formats.

No CFI

no_cfi@func

WithControl-Flow Integrity (CFI), a ‘no_cfi’constant represents a function reference that does not get replaced with areference to the CFI jump table in theLowerTypeTests pass. These constantsmay be useful in low-level programs, such as operating system kernels, whichneed to refer to the actual function body.

Pointer Authentication Constants

ptrauth(ptrCST,i32KEY[,i64DISC[,ptrADDRDISC]?]?)

A ‘ptrauth’ constant represents a pointer with a cryptographicauthentication signature embedded into some bits, as described in thePointer Authentication document.

A ‘ptrauth’ constant is simply a constant equivalent to thellvm.ptrauth.sign intrinsic, potentially fed by a discriminatorllvm.ptrauth.blend if needed.

Its type is the same as the first argument. An integer constant discriminatorand an address discriminator may be optionally specified. Otherwise, they havevaluesi640 andptrnull.

If the address discriminator isnull then the expression is equivalent to

%tmp=calli64@llvm.ptrauth.sign(i64ptrtoint(ptrCSTtoi64),i32KEY,i64DISC)%val=inttoptri64%tmptoptr

Otherwise, the expression is equivalent to:

%tmp1=calli64@llvm.ptrauth.blend(i64ptrtoint(ptrADDRDISCtoi64),i64DISC)%tmp2=calli64@llvm.ptrauth.sign(i64ptrtoint(ptrCSTtoi64),i32KEY,i64%tmp1)%val=inttoptri64%tmp2toptr

Constant Expressions

Constant expressions are used to allow expressions involving otherconstants to be used as constants. Constant expressions may be of anyfirst class type and may involve any LLVM operationthat does not have side effects (e.g. load and call are not supported).The following is the syntax for constant expressions:

trunc(CSTtoTYPE)

Perform thetrunc operation on constants.

ptrtoint(CSTtoTYPE)

Perform theptrtoint operation on constants.

inttoptr(CSTtoTYPE)

Perform theinttoptr operation on constants.This one isreally dangerous!

bitcast(CSTtoTYPE)

Convert a constant, CST, to another TYPE.The constraints of the operands are the same as those for thebitcast instruction.

addrspacecast(CSTtoTYPE)

Convert a constant pointer or constant vector of pointer, CST, to anotherTYPE in a different address space. The constraints of the operands are thesame as those for theaddrspacecast instruction.

getelementptr(TY,CSTPTR,IDX0,IDX1,...),getelementptrinbounds(TY,CSTPTR,IDX0,IDX1,...)

Perform thegetelementptr operation onconstants. As with thegetelementptrinstruction, the index list may have one or more indexes, which arerequired to make sense for the type of “pointer to TY”. These indexesmay be implicitly sign-extended or truncated to match the index sizeof CSTPTR’s address space.

extractelement(VAL,IDX)

Perform theextractelement operation onconstants.

insertelement(VAL,ELT,IDX)

Perform theinsertelement operation onconstants.

shufflevector(VEC1,VEC2,IDXMASK)

Perform theshufflevector operation onconstants.

add(LHS,RHS)

Perform an addition on constants.

sub(LHS,RHS)

Perform a subtraction on constants.

xor(LHS,RHS)

Perform a bitwise xor on constants.

Other Values

Inline Assembler Expressions

LLVM supports inline assembler expressions (as opposed toModule-LevelInline Assembly) through the use of a special value. This valuerepresents the inline assembler as a template string (containing theinstructions to emit), a list of operand constraints (stored as a string), aflag that indicates whether or not the inline asm expression has side effects,and a flag indicating whether the function containing the asm needs to align itsstack conservatively.

The template string supports argument substitution of the operands using “$”followed by a number, to indicate substitution of the given register/memorylocation, as specified by the constraint string. “${NUM:MODIFIER}” may alsobe used, whereMODIFIER is a target-specific annotation for how to print theoperand (SeeAsm template argument modifiers).

A literal “$” may be included by using “$$” in the template. To includeother special characters into the output, the usual “\XX” escapes may beused, just as in other strings. Note that after template substitution, theresulting assembly string is parsed by LLVM’s integrated assembler unless it isdisabled – even when emitting a.s file – and thus must contain assemblysyntax known to LLVM.

LLVM also supports a few more substitutions useful for writing inline assembly:

  • ${:uid}: Expands to a decimal integer unique to this inline assembly blob.This substitution is useful when declaring a local label. Many standardcompiler optimizations, such as inlining, may duplicate an inline asm blob.Adding a blob-unique identifier ensures that the two labels will not conflictduring assembly. This is used to implementGCC’s %= special formatstring.

  • ${:comment}: Expands to the comment character of the current target’sassembly dialect. This is usually#, but many targets use other strings,such as;,//, or!.

  • ${:private}: Expands to the assembler private label prefix. Labels withthis prefix will not appear in the symbol table of the assembled object.Typically the prefix isL, but targets may use other strings..L isrelatively popular.

LLVM’s support for inline asm is modeled closely on the requirements of Clang’sGCC-compatible inline-asm support. Thus, the feature-set and the constraint andmodifier codes listed here are similar or identical to those in GCC’s inline asmsupport. However, to be clear, the syntax of the template and constraint stringsdescribed here isnot the same as the syntax accepted by GCC and Clang, and,while most constraint letters are passed through as-is by Clang, some gettranslated to other codes when converting from the C source to the LLVMassembly.

An example inline assembler expression is:

i32(i32)asm"bswap $0","=r,r"

Inline assembler expressions mayonly be used as the callee operandof acall or aninvoke instruction.Thus, typically we have:

%X=calli32asm"bswap $0","=r,r"(i32%Y)

Inline asms with side effects not visible in the constraint list must bemarked as having side effects. This is done through the use of the‘sideeffect’ keyword, like so:

callvoidasmsideeffect"eieio",""()

In some cases inline asms will contain code that will not work unlessthe stack is aligned in some way, such as calls or SSE instructions onx86, yet will not contain code that does that alignment within the asm.The compiler should make conservative assumptions about what the asmmight contain and should generate its usual stack alignment code in theprologue if the ‘alignstack’ keyword is present:

callvoidasmalignstack"eieio",""()

Inline asms also support using non-standard assembly dialects. Theassumed dialect is ATT. When the ‘inteldialect’ keyword is present,the inline asm is using the Intel dialect. Currently, ATT and Intel arethe only supported dialects. An example is:

callvoidasminteldialect"eieio",""()

In the case that the inline asm might unwind the stack,the ‘unwind’ keyword must be used, so that the compiler emitsunwinding information:

callvoidasmunwind"call func",""()

If the inline asm unwinds the stack and isn’t marked withthe ‘unwind’ keyword, the behavior is undefined.

If multiple keywords appear, the ‘sideeffect’ keyword must comefirst, the ‘alignstack’ keyword second, the ‘inteldialect’ keywordthird and the ‘unwind’ keyword last.

Inline Asm Constraint String

The constraint list is a comma-separated string, each element containing one ormore constraint codes.

For each element in the constraint list an appropriate register or memoryoperand will be chosen, and it will be made available to assembly templatestring expansion as$0 for the first constraint in the list,$1 for thesecond, etc.

There are three different types of constraints, which are distinguished by aprefix symbol in front of the constraint code: Output, Input, and Clobber. Theconstraints must always be given in that order: outputs first, then inputs, thenclobbers. They cannot be intermingled.

There are also three different categories of constraint codes:

  • Register constraint. This is either a register class, or a fixed physicalregister. This kind of constraint will allocate a register, and if necessary,bitcast the argument or result to the appropriate type.

  • Memory constraint. This kind of constraint is for use with an instructiontaking a memory operand. Different constraints allow for different addressingmodes used by the target.

  • Immediate value constraint. This kind of constraint is for an integer or otherimmediate value which can be rendered directly into an instruction. Thevarious target-specific constraints allow the selection of a value in theproper range for the instruction you wish to use it with.

Output constraints

Output constraints are specified by an “=” prefix (e.g. “=r”). Thisindicates that the assembly will write to this operand, and the operand willthen be made available as a return value of theasm expression. Outputconstraints do not consume an argument from the call instruction. (Except, seebelow about indirect outputs).

Normally, it is expected that no output locations are written to by the assemblyexpression untilall of the inputs have been read. As such, LLVM may assignthe same register to an output and an input. If this is not safe (e.g. if theassembly contains two instructions, where the first writes to one output, andthe second reads an input and writes to a second output), then the “&”modifier must be used (e.g. “=&r”) to specify that the output is an“early-clobber” output. Marking an output as “early-clobber” ensures that LLVMwill not use the same register for any inputs (other than an input tied to thisoutput).

Input constraints

Input constraints do not have a prefix – just the constraint codes. Each inputconstraint will consume one argument from the call instruction. It is notpermitted for the asm to write to any input register or memory location (unlessthat input is tied to an output). Note also that multiple inputs may all beassigned to the same register, if LLVM can determine that they necessarily allcontain the same value.

Instead of providing a Constraint Code, input constraints may also “tie”themselves to an output constraint, by providing an integer as the constraintstring. Tied inputs still consume an argument from the call instruction, andtake up a position in the asm template numbering as is usual – they will simplybe constrained to always use the same register as the output they’ve been tiedto. For example, a constraint string of “=r,0” says to assign a register foroutput, and use that register as an input as well (it being the 0’thconstraint).

It is permitted to tie an input to an “early-clobber” output. In that case, noother input may share the same register as the input tied to the early-clobber(even when the other input has the same value).

You may only tie an input to an output which has a register constraint, not amemory constraint. Only a single input may be tied to an output.

There is also an “interesting” feature which deserves a bit of explanation: if aregister class constraint allocates a register which is too small for the valuetype operand provided as input, the input value will be split into multipleregisters, and all of them passed to the inline asm.

However, this feature is often not as useful as you might think.

Firstly, the registers arenot guaranteed to be consecutive. So, on thosearchitectures that have instructions which operate on multiple consecutiveinstructions, this is not an appropriate way to support them. (e.g. the 32-bitSparcV8 has a 64-bit load, which instruction takes a single 32-bit register. Thehardware then loads into both the named register, and the next register. Thisfeature of inline asm would not be useful to support that.)

A few of the targets provide a template string modifier allowing explicit accessto the second register of a two-register operand (e.g. MIPSL,M, andD). On such an architecture, you can actually access the second allocatedregister (yet, still, not any subsequent ones). But, in that case, you’re stillprobably better off simply splitting the value into two separate operands, forclarity. (e.g. see the description of theA constraint on X86, which,despite existing only for use with this feature, is not really a good idea touse)

Indirect inputs and outputs

Indirect output or input constraints can be specified by the “*” modifier(which goes after the “=” in case of an output). This indicates that the asmwill write to or read from the contents of anaddress provided as an inputargument. (Note that in this way, indirect outputs act more like aninput thanan output: just like an input, they consume an argument of the call expression,rather than producing a return value. An indirect output constraint is an“output” only in that the asm is expected to write to the contents of the inputmemory location, instead of just read from it).

This is most typically used for memory constraint, e.g. “=*m”, to pass theaddress of a variable as a value.

It is also possible to use an indirectregister constraint, but only on output(e.g. “=*r”). This will cause LLVM to allocate a register for an outputvalue normally, and then, separately emit a store to the address provided asinput, after the provided inline asm. (It’s not clear what value thisfunctionality provides, compared to writing the store explicitly after the asmstatement, and it can only produce worse code, since it bypasses manyoptimization passes. I would recommend not using it.)

Call arguments for indirect constraints must have pointer type and must specifytheelementtype attribute to indicate the pointerelement type.

Clobber constraints

A clobber constraint is indicated by a “~” prefix. A clobber does notconsume an input operand, nor generate an output. Clobbers cannot use any of thegeneral constraint code letters – they may use only explicit registerconstraints, e.g. “~{eax}”. The one exception is that a clobber string of“~{memory}” indicates that the assembly writes to arbitrary undeclaredmemory locations – not only the memory pointed to by a declared indirectoutput.

Note that clobbering named registers that are also present in outputconstraints is not legal.

Label constraints

A label constraint is indicated by a “!” prefix and typically used in theform"!i". Instead of consuming call arguments, label constraints consumeindirect destination labels ofcallbr instructions.

Label constraints can only be used in conjunction withcallbr and thenumber of label constraints must match the number of indirect destinationlabels in thecallbr instruction.

Constraint Codes

After a potential prefix comes constraint code, or codes.

A Constraint Code is either a single letter (e.g. “r”), a “^” characterfollowed by two letters (e.g. “^wc”), or “{” register-name “}”(e.g. “{eax}”).

The one and two letter constraint codes are typically chosen to be the same asGCC’s constraint codes.

A single constraint may include one or more than constraint code in it, leavingit up to LLVM to choose which one to use. This is included mainly forcompatibility with the translation of GCC inline asm coming from clang.

There are two ways to specify alternatives, and either or both may be used in aninline asm constraint list:

  1. Append the codes to each other, making a constraint code set. E.g. “im”or “{eax}m”. This means “choose any of the options in the set”. Thechoice of constraint is made independently for each constraint in theconstraint list.

  2. Use “|” between constraint code sets, creating alternatives. Everyconstraint in the constraint list must have the same number of alternativesets. With this syntax, the same alternative inall of the items in theconstraint list will be chosen together.

Putting those together, you might have a two operand constraint string like"rm|r,ri|rm". This indicates that if operand 0 isr orm, thenoperand 1 may be one ofr ori. If operand 0 isr, then operand 1may be one ofr orm. But, operand 0 and 1 cannot both be of type m.

However, the use of either of the alternatives features isNOT recommended, asLLVM is not able to make an intelligent choice about which one to use. (At thepoint it currently needs to choose, not enough information is available to do soin a smart way.) Thus, it simply tries to make a choice that’s most likely tocompile, not one that will be optimal performance. (e.g., given “rm”, it’llalways choose to use memory, not registers). And, if given multiple registers,or multiple register classes, it will simply choose the first one. (In fact, itdoesn’t currently even ensure explicitly specified physical registers areunique, so specifying multiple physical registers as alternatives, like{r11}{r12},{r11}{r12}, will assign r11 to both operands, not at all what wasintended.)

Supported Constraint Code List

The constraint codes are, in general, expected to behave the same way they do inGCC. LLVM’s support is often implemented on an ‘as-needed’ basis, to support Cinline asm code which was supported by GCC. A mismatch in behavior between LLVMand GCC likely indicates a bug in LLVM.

Some constraint codes are typically supported by all targets:

  • r: A register in the target’s general purpose register class.

  • m: A memory address operand. It is target-specific what addressing modesare supported, typical examples are register, or register + register offset,or register + immediate offset (of some target-specific size).

  • p: An address operand. Similar tom, but used by “load address”type instructions without touching memory.

  • i: An integer constant (of target-specific width). Allows either a simpleimmediate, or a relocatable value.

  • n: An integer constant –not including relocatable values.

  • s: A symbol or label reference with a constant offset.

  • X: Allows an operand of any kind, no constraint whatsoever. Typicallyuseful to pass a label for an asm branch or call.

  • {register-name}: Requires exactly the named physical register.

Other constraints are target-specific:

AArch64:

  • z: An immediate integer 0. OutputsWZR orXZR, as appropriate.

  • I: An immediate integer valid for anADD orSUB instruction,i.e. 0 to 4095 with optional shift by 12.

  • J: An immediate integer that, when negated, is valid for anADD orSUB instruction, i.e. -1 to -4095 with optional left shift by 12.

  • K: An immediate integer that is valid for the ‘bitmask immediate 32’ of alogical instruction likeAND,EOR, orORR with a 32-bit register.

  • L: An immediate integer that is valid for the ‘bitmask immediate 64’ of alogical instruction likeAND,EOR, orORR with a 64-bit register.

  • M: An immediate integer for use with theMOV assembly alias on a32-bit register. This is a superset ofK: in addition to the bitmaskimmediate, also allows immediate integers which can be loaded with a singleMOVZ orMOVL instruction.

  • N: An immediate integer for use with theMOV assembly alias on a64-bit register. This is a superset ofL.

  • Q: Memory address operand must be in a single register (nooffsets). (However, LLVM currently does this for them constraint aswell.)

  • r: A 32 or 64-bit integer register (W* or X*).

  • S: A symbol or label reference with a constant offset. The genericsis not supported.

  • Uci: Like r, but restricted to registers 8 to 11 inclusive.

  • Ucj: Like r, but restricted to registers 12 to 15 inclusive.

  • w: A 32, 64, or 128-bit floating-point, SIMD or SVE vector register.

  • x: Like w, but restricted to registers 0 to 15 inclusive.

  • y: Like w, but restricted to SVE vector registers Z0 to Z7 inclusive.

  • Uph: One of the upper eight SVE predicate registers (P8 to P15)

  • Upl: One of the lower eight SVE predicate registers (P0 to P7)

  • Upa: Any of the SVE predicate registers (P0 to P15)

AMDGPU:

  • r: A 32 or 64-bit integer register.

  • [0-9]v: The 32-bit VGPR register, number 0-9.

  • [0-9]s: The 32-bit SGPR register, number 0-9.

  • [0-9]a: The 32-bit AGPR register, number 0-9.

  • I: An integer inline constant in the range from -16 to 64.

  • J: A 16-bit signed integer constant.

  • A: An integer or a floating-point inline constant.

  • B: A 32-bit signed integer constant.

  • C: A 32-bit unsigned integer constant or an integer inline constant in the range from -16 to 64.

  • DA: A 64-bit constant that can be split into two “A” constants.

  • DB: A 64-bit constant that can be split into two “B” constants.

All ARM modes:

  • Q,Um,Un,Uq,Us,Ut,Uv,Uy: Memory addressoperand. Treated the same as operandm, at the moment.

  • Te: An even general-purpose 32-bit integer register:r0,r2,...,r12,r14

  • To: An odd general-purpose 32-bit integer register:r1,r3,...,r11

ARM and ARM’s Thumb2 mode:

  • j: An immediate integer between 0 and 65535 (valid forMOVW)

  • I: An immediate integer valid for a data-processing instruction.

  • J: An immediate integer between -4095 and 4095.

  • K: An immediate integer whose bitwise inverse is valid for adata-processing instruction. (Can be used with template modifier “B” toprint the inverted value).

  • L: An immediate integer whose negation is valid for a data-processinginstruction. (Can be used with template modifier “n” to print the negatedvalue).

  • M: A power of two or an integer between 0 and 32.

  • N: Invalid immediate constraint.

  • O: Invalid immediate constraint.

  • r: A general-purpose 32-bit integer register (r0-r15).

  • l: In Thumb2 mode, low 32-bit GPR registers (r0-r7). In ARM mode, sameasr.

  • h: In Thumb2 mode, a high 32-bit GPR register (r8-r15). In ARM mode,invalid.

  • w: A 32, 64, or 128-bit floating-point/SIMD register in the rangess0-s31,d0-d31, orq0-q15, respectively.

  • t: A 32, 64, or 128-bit floating-point/SIMD register in the rangess0-s31,d0-d15, orq0-q7, respectively.

  • x: A 32, 64, or 128-bit floating-point/SIMD register in the rangess0-s15,d0-d7, orq0-q3, respectively.

ARM’s Thumb1 mode:

  • I: An immediate integer between 0 and 255.

  • J: An immediate integer between -255 and -1.

  • K: An immediate integer between 0 and 255, with optional left-shift bysome amount.

  • L: An immediate integer between -7 and 7.

  • M: An immediate integer which is a multiple of 4 between 0 and 1020.

  • N: An immediate integer between 0 and 31.

  • O: An immediate integer which is a multiple of 4 between -508 and 508.

  • r: A low 32-bit GPR register (r0-r7).

  • l: A low 32-bit GPR register (r0-r7).

  • h: A high GPR register (r0-r7).

  • w: A 32, 64, or 128-bit floating-point/SIMD register in the rangess0-s31,d0-d31, orq0-q15, respectively.

  • t: A 32, 64, or 128-bit floating-point/SIMD register in the rangess0-s31,d0-d15, orq0-q7, respectively.

  • x: A 32, 64, or 128-bit floating-point/SIMD register in the rangess0-s15,d0-d7, orq0-q3, respectively.

Hexagon:

  • o,v: A memory address operand, treated the same as constraintm,at the moment.

  • r: A 32 or 64-bit register.

LoongArch:

  • f: A floating-point register (if available).

  • k: A memory operand whose address is formed by a base register and(optionally scaled) index register.

  • l: A signed 16-bit constant.

  • m: A memory operand whose address is formed by a base register andoffset that is suitable for use in instructions with the same addressingmode as st.w and ld.w.

  • q: A general-purpose register except for $r0 and $r1 (for the csrxchginstruction).

  • I: A signed 12-bit constant (for arithmetic instructions).

  • J: An immediate integer zero.

  • K: An unsigned 12-bit constant (for logic instructions).

  • ZB: An address that is held in a general-purpose register. The offsetis zero.

  • ZC: A memory operand whose address is formed by a base register andoffset that is suitable for use in instructions with the same addressingmode as ll.w and sc.w.

MSP430:

  • r: An 8 or 16-bit register.

MIPS:

  • I: An immediate signed 16-bit integer.

  • J: An immediate integer zero.

  • K: An immediate unsigned 16-bit integer.

  • L: An immediate 32-bit integer, where the lower 16 bits are 0.

  • N: An immediate integer between -65535 and -1.

  • O: An immediate signed 15-bit integer.

  • P: An immediate integer between 1 and 65535.

  • m: A memory address operand. In MIPS-SE mode, allows a base addressregister plus 16-bit immediate offset. In MIPS mode, just a base register.

  • R: A memory address operand. In MIPS-SE mode, allows a base addressregister plus a 9-bit signed offset. In MIPS mode, the same as constraintm.

  • ZC: A memory address operand, suitable for use in apref,ll, orsc instruction on the given subtarget (details vary).

  • r,d,y: A 32 or 64-bit GPR register.

  • f: A 32 or 64-bit FPU register (F0-F31), or a 128-bit MSA register(W0-W31). In the case of MSA registers, it is recommended to use thewargument modifier for compatibility with GCC.

  • c: A 32-bit or 64-bit GPR register suitable for indirect jump (always25).

  • l: Thelo register, 32 or 64-bit.

  • x: Invalid.

NVPTX:

  • b: A 1-bit integer register.

  • c orh: A 16-bit integer register.

  • r: A 32-bit integer register.

  • l orN: A 64-bit integer register.

  • q: A 128-bit integer register.

  • f: A 32-bit float register.

  • d: A 64-bit float register.

PowerPC:

  • I: An immediate signed 16-bit integer.

  • J: An immediate unsigned 16-bit integer, shifted left 16 bits.

  • K: An immediate unsigned 16-bit integer.

  • L: An immediate signed 16-bit integer, shifted left 16 bits.

  • M: An immediate integer greater than 31.

  • N: An immediate integer that is an exact power of 2.

  • O: The immediate integer constant 0.

  • P: An immediate integer constant whose negation is a signed 16-bitconstant.

  • es,o,Q,Z,Zy: A memory address operand, currentlytreated the same asm.

  • r: A 32 or 64-bit integer register.

  • b: A 32 or 64-bit integer register, excludingR0 (that is:R1-R31).

  • f: A 32 or 64-bit float register (F0-F31),

  • v: For4xf32 or4xf64 types, a 128-bit altivec vector

    register (V0-V31).

  • y: Condition register (CR0-CR7).

  • wc: An individual CR bit in a CR register.

  • wa,wd,wf: Any 128-bit VSX vector register, from the full VSXregister set (overlapping both the floating-point and vector register files).

  • ws: A 32 or 64-bit floating-point register, from the full VSX registerset.

RISC-V:

  • A: An address operand (using a general-purpose register, without anoffset).

  • I: A 12-bit signed integer immediate operand.

  • J: A zero integer immediate operand.

  • K: A 5-bit unsigned integer immediate operand.

  • f: A 32- or 64-bit floating-point register (requires F or D extension).

  • r: A 32- or 64-bit general-purpose register (depending on the platformXLEN).

  • S: Alias fors.

  • vd: A vector register, excludingv0 (requires V extension).

  • vm: The vector registerv0 (requires V extension).

  • vr: A vector register (requires V extension).

Sparc:

  • I: An immediate 13-bit signed integer.

  • r: A 32-bit integer register.

  • f: Any floating-point register on SparcV8, or a floating-pointregister in the “low” half of the registers on SparcV9.

  • e: Any floating-point register. (Same asf on SparcV8.)

SystemZ:

  • I: An immediate unsigned 8-bit integer.

  • J: An immediate unsigned 12-bit integer.

  • K: An immediate signed 16-bit integer.

  • L: An immediate signed 20-bit integer.

  • M: An immediate integer 0x7fffffff.

  • Q: A memory address operand with a base address and a 12-bit immediateunsigned displacement.

  • R: A memory address operand with a base address, a 12-bit immediateunsigned displacement, and an index register.

  • S: A memory address operand with a base address and a 20-bit immediatesigned displacement.

  • T: A memory address operand with a base address, a 20-bit immediatesigned displacement, and an index register.

  • r ord: A 32, 64, or 128-bit integer register.

  • a: A 32, 64, or 128-bit integer address register (excludes R0, which in anaddress context evaluates as zero).

  • h: A 32-bit value in the high part of a 64bit data register(LLVM-specific)

  • f: A 16, 32, 64, or 128-bit floating-point register.

X86:

  • I: An immediate integer between 0 and 31.

  • J: An immediate integer between 0 and 64.

  • K: An immediate signed 8-bit integer.

  • L: An immediate integer, 0xff or 0xffff or (in 64-bit mode only)0xffffffff.

  • M: An immediate integer between 0 and 3.

  • N: An immediate unsigned 8-bit integer.

  • O: An immediate integer between 0 and 127.

  • e: An immediate 32-bit signed integer.

  • Z: An immediate 32-bit unsigned integer.

  • q: An 8, 16, 32, or 64-bit register which can be accessed as an 8-bitl integer register. On X86-32, this is thea,b,c, anddregisters, and on X86-64, it is all of the integer registers. When featureegpr andinline-asm-use-gpr32 are both on, it will be extended to gpr32.

  • Q: An 8, 16, 32, or 64-bit register which can be accessed as an 8-bith integer register. This is thea,b,c, andd registers.

  • r orl: An 8, 16, 32, or 64-bit integer register. When featureegpr andinline-asm-use-gpr32 are both on, it will be extended to gpr32.

  • R: An 8, 16, 32, or 64-bit “legacy” integer register – one which hasexisted since i386, and can be accessed without the REX prefix.

  • f: A 32, 64, or 80-bit ‘387 FPU stack pseudo-register.

  • y: A 64-bit MMX register, if MMX is enabled.

  • v: If SSE is enabled: a 32 or 64-bit scalar operand, or 128-bit vectoroperand in a SSE register. If AVX is also enabled, can also be a 256-bitvector operand in an AVX register. If AVX-512 is also enabled, can also be a512-bit vector operand in an AVX512 register. Otherwise, an error.

  • Ws: A symbolic reference with an optional constant addend or a labelreference.

  • x: The same asv, except that when AVX-512 is enabled, thex codeonly allocates into the first 16 AVX-512 registers, while thev codeallocates into any of the 32 AVX-512 registers.

  • Y: The same asx, ifSSE2 is enabled, otherwise an error.

  • A: Special case: allocates EAX first, then EDX, for a single operand (in32-bit mode, a 64-bit integer operand will get split into two registers). Itis not recommended to use this constraint, as in 64-bit mode, the 64-bitoperand will get allocated only to RAX – if two 32-bit operands are needed,you’re better off splitting it yourself, before passing it to the asmstatement.

  • jr: An 8, 16, 32, or 64-bit integer gpr16. It won’t be extended to gpr32when featureegpr orinline-asm-use-gpr32 is on.

  • jR: An 8, 16, 32, or 64-bit integer gpr32 when featureegpr` is on.Otherwise, same asr.

XCore:

  • r: A 32-bit integer register.

Asm template argument modifiers

In the asm template string, modifiers can be used on the operand reference, like“${0:n}”.

The modifiers are, in general, expected to behave the same way they do inGCC. LLVM’s support is often implemented on an ‘as-needed’ basis, to support Cinline asm code which was supported by GCC. A mismatch in behavior between LLVMand GCC likely indicates a bug in LLVM.

Target-independent:

  • a: Print a memory reference. Targets might customize the output.

  • c: Print an immediate integer constant unadorned, withoutthe target-specific immediate punctuation (e.g. no$ prefix).

  • n: Negate and print immediate integer constant unadorned, without thetarget-specific immediate punctuation (e.g. no$ prefix).

  • l: Print as an unadorned label, without the target-specific labelpunctuation (e.g. no$ prefix).

AArch64:

  • w: Print a GPR register with aw* name instead ofx* name. E.g.,instead ofx30, printw30.

  • x: Print a GPR register with ax* name. (this is the default, anyhow).

  • b,h,s,d,q: Print a floating-point/SIMD register with ab*,h*,s*,d*, orq* name, rather than the default ofv*.

AMDGPU:

  • r: No effect.

ARM:

  • a: Print an operand as an address (with[ and] surrounding aregister).

  • P: No effect.

  • q: No effect.

  • y: Print a VFP single-precision register as an indexed double (e.g. printasd4[1] instead ofs9)

  • B: Bitwise invert and print an immediate integer constant without#prefix.

  • L: Print the low 16-bits of an immediate integer constant.

  • M: Print as a register set suitable for ldm/stm. Also printsallregister operands subsequent to the specified one (!), so use carefully.

  • Q: Print the low-order register of a register-pair, or the low-orderregister of a two-register operand.

  • R: Print the high-order register of a register-pair, or the high-orderregister of a two-register operand.

  • H: Print the second register of a register-pair. (On a big-endian system,H is equivalent toQ, and on little-endian system,H is equivalenttoR.)

  • e: Print the low doubleword register of a NEON quad register.

  • f: Print the high doubleword register of a NEON quad register.

  • m: Print the base register of a memory operand without the[ and]adornment.

Hexagon:

  • L: Print the second register of a two-register operand. Requires that ithas been allocated consecutively to the first.

  • I: Print the letter ‘i’ if the operand is an integer constant, otherwisenothing. Used to print ‘addi’ vs ‘add’ instructions.

LoongArch:

  • u: Print an LASX register.

  • w: Print an LSX register.

  • z: Print $zero register if operand is zero, otherwise print it normally.

MSP430:

No additional modifiers.

MIPS:

  • X: Print an immediate integer as hexadecimal

  • x: Print the low 16 bits of an immediate integer as hexadecimal.

  • d: Print an immediate integer as decimal.

  • m: Subtract one and print an immediate integer as decimal.

  • z: Print $0 if an immediate zero, otherwise print normally.

  • L: Print the low-order register of a two-register operand, or prints theaddress of the low-order word of a double-word memory operand.

  • M: Print the high-order register of a two-register operand, or prints theaddress of the high-order word of a double-word memory operand.

  • D: Print the second register of a two-register operand, or prints thesecond word of a double-word memory operand. (On a big-endian system,D isequivalent toL, and on little-endian system,D is equivalent toM.)

  • w: No effect. Provided for compatibility with GCC which requires thismodifier in order to print MSA registers (W0-W31) with thefconstraint.

NVPTX:

  • r: No effect.

PowerPC:

  • L: Print the second register of a two-register operand. Requires that ithas been allocated consecutively to the first.

  • I: Print the letter ‘i’ if the operand is an integer constant, otherwisenothing. Used to print ‘addi’ vs ‘add’ instructions.

  • y: For a memory operand, prints formatter for a two-register X-forminstruction. (Currently always printsr0,OPERAND).

  • U: Prints ‘u’ if the memory operand is an update form, and nothingotherwise. (NOTE: LLVM does not support update form, so this will currentlyalways print nothing)

  • X: Prints ‘x’ if the memory operand is an indexed form. (NOTE: LLVM doesnot support indexed form, so this will currently always print nothing)

RISC-V:

  • i: Print the letter ‘i’ if the operand is not a register, otherwise printnothing. Used to print ‘addi’ vs ‘add’ instructions, etc.

  • z: Print the registerzero if an immediate zero, otherwise printnormally.

Sparc:

  • L: Print the low-order register of a two-register operand.

  • H: Print the high-order register of a two-register operand.

  • r: No effect.

SystemZ:

SystemZ implements onlyn, and doesnot support any of the othertarget-independent modifiers.

X86:

  • a: Print a memory reference. This displays assym(%rip) for x86-64.i386 should only use this with the static relocation model.

  • c: Print an unadorned integer or symbol name. (The latter istarget-specific behavior for this typically target-independent modifier).

  • A: Print a register name with a ‘*’ before it.

  • b: Print an 8-bit register name (e.g.al); do nothing on a memoryoperand.

  • h: Print the upper 8-bit register name (e.g.ah); do nothing on amemory operand.

  • w: Print the 16-bit register name (e.g.ax); do nothing on a memoryoperand.

  • k: Print the 32-bit register name (e.g.eax); do nothing on a memoryoperand.

  • q: Print the 64-bit register name (e.g.rax), if 64-bit registers areavailable, otherwise the 32-bit register name; do nothing on a memory operand.

  • n: Negate and print an unadorned integer, or, for operands other than animmediate integer (e.g. a relocatable symbol expression), print a ‘-’ beforethe operand. (The behavior for relocatable symbol expressions is atarget-specific behavior for this typically target-independent modifier)

  • H: Print a memory reference with additional offset +8.

  • p: Print a raw symbol name (without syntax-specific prefixes).

  • P: Print a memory reference used as the argument of a call instruction orused with explicit base reg and index reg as its offset. So it can not useadditional regs to present the memory reference. (E.g. omit(rip), eventhough it’s PC-relative.)

XCore:

No additional modifiers.

Inline Asm Metadata

The call instructions that wrap inline asm nodes may have a“!srcloc” MDNode attached to it that contains a list of constantintegers. If present, the code generator will use the integer as thelocation cookie value when report errors through theLLVMContexterror reporting mechanisms. This allows a front-end to correlate backenderrors that occur with inline asm back to the source code that producedit. For example:

callvoidasmsideeffect"something bad",""(),!srcloc!42...!42=!{i641234567}

It is up to the front-end to make sense of the magic numbers it placesin the IR. If the MDNode contains multiple constants, the code generatorwill use the one that corresponds to the line of the asm that the erroroccurs on.

Metadata

LLVM IR allows metadata to be attached to instructions and global objects inthe program that can convey extra information about the code to the optimizersand code generator.

There are two metadata primitives: strings and nodes. There arealso specialized nodes which have a distinguished name and a set of namedarguments.

Note

One example application of metadata is source-level debug information,which is currently the only user of specialized nodes.

Metadata does not have a type, and is not a value.

A value of non-metadata type can be used in a metadata context using thesyntax ‘<type><value>’.

All other metadata is identified in syntax as starting with an exclamationpoint (’!’).

Metadata may be used in the following value contexts by using themetadatatype:

  • Arguments to certain intrinsic functions, as described in their specification.

  • Arguments to thecatchpad/cleanuppad instructions.

Note

Metadata can be “wrapped” in aMetadataAsValue so it can be referencedin a value context:MetadataAsValue is-aValue.

A typed value can be “wrapped” inValueAsMetadata so it can bereferenced in a metadata context:ValueAsMetadata is-aMetadata.

There is no explicit syntax for aValueAsMetadata, and insteadthe fact that a type identifier cannot begin with an exclamation pointis used to resolve ambiguity.

Ametadata type implies aMetadataAsValue, and when followed with a‘<type><value>’ pair it wraps the typed value in aValueAsMetadata.

For example, the first argumentto this call is aMetadataAsValue(ValueAsMetadata(Value)):

callvoid@llvm.foo(metadatai321)

Whereas the first argument to this call is aMetadataAsValue(MDNode):

callvoid@llvm.foo(metadata!0)

The first element of thisMDTuple is aMDNode:

!{!0}

And the first element of thisMDTuple is aValueAsMetadata(Value):

!{i321}

Metadata Strings (MDString)

A metadata string is a string surrounded by double quotes. It cancontain any character by escaping non-printable characters with“\xx” where “xx” is the two digit hex code. For example:“!"test\00"”.

Note

A metadata string is metadata, but is not a metadata node.

Metadata Nodes (MDNode)

Metadata tuples are represented with notation similar to structureconstants: a comma separated list of elements, surrounded by braces andpreceded by an exclamation point. Metadata nodes can have any values astheir operand. For example:

!{!"test\00",i3210}

Metadata nodes that aren’t uniqued use thedistinct keyword. For example:

!0 = distinct !{!"test\00", i32 10}

distinct nodes are useful when nodes shouldn’t be merged based on theircontent. They can also occur when transformations cause uniquing collisionswhen metadata operands change.

Anamed metadata is a collection ofmetadata nodes, which can be looked up in the module symbol table. Forexample:

!foo=!{!4,!3}

Metadata can be used as function arguments. Here thellvm.dbg.valueintrinsic is using three metadata arguments:

callvoid@llvm.dbg.value(metadata!24,metadata!25,metadata!26)

Metadata can be attached to an instruction. Here metadata!21 is attachedto theadd instruction using the!dbg identifier:

%indvar.next=addi64%indvar,1,!dbg!21

Instructions may not have multiple metadata attachments with the sameidentifier.

Metadata can also be attached to a function or a global variable. Here metadata!22 is attached to thef1 andf2 functions, and the globalsg1andg2 using the!dbg identifier:

declare!dbg!22void@f1()definevoid@f2()!dbg!22{retvoid}@g1=globali320,!dbg!22@g2=externalglobali32,!dbg!22

Unlike instructions, global objects (functions and global variables) may havemultiple metadata attachments with the same identifier.

A transformation is required to drop any metadata attachment that itdoes not know or know it can’t preserve. Currently there is anexception for metadata attachment to globals for!func_sanitize,!type,!absolute_symbol and!associated which can’t beunconditionally dropped unless the global is itself deleted.

Metadata attached to a module using named metadata may not be dropped, withthe exception of debug metadata (named metadata with the name!llvm.dbg.*).

More information about specific metadata nodes recognized by theoptimizers and code generator is found below.

Specialized Metadata Nodes

Specialized metadata nodes are custom data structures in metadata (as opposedto generic tuples). Their fields are labelled, and can be specified in anyorder.

These aren’t inherently debug info centric, but currently all the specializedmetadata nodes are related to debug info.

DICompileUnit

DICompileUnit nodes represent a compile unit. Theenums:,retainedTypes:,globals:,imports: andmacros: fields are tuplescontaining the debug info to be emitted along with the compile unit, regardlessof code optimizations (some nodes are only emitted if there are references tothem from instructions). ThedebugInfoForProfiling: field is a booleanindicating whether or not line-table discriminators are updated to providemore-accurate debug info for profiling results.

!0 = !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang",                    isOptimized: true, flags: "-O2", runtimeVersion: 2,                    splitDebugFilename: "abc.debug", emissionKind: FullDebug,                    enums: !2, retainedTypes: !3, globals: !4, imports: !5,                    macros: !6, dwoId: 0x0abcd)

Compile unit descriptors provide the root scope for objects declared in aspecific compilation unit. File descriptors are defined using this scope. Thesedescriptors are collected by a named metadata node!llvm.dbg.cu. They keeptrack of global variables, type information, and imported entities (declarationsand namespaces).

DIFile

DIFile nodes represent files. Thefilename: can include slashes.

!0 = !DIFile(filename: "path/to/file", directory: "/path/to/dir",             checksumkind: CSK_MD5,             checksum: "000102030405060708090a0b0c0d0e0f")

Files are sometimes used inscope: fields, and are the only valid targetforfile: fields.

Thechecksum: andchecksumkind: fields are optional. If one of thesefields is present, then the other is required to be present as well. Validvalues forchecksumkind: field are: {CSK_MD5, CSK_SHA1, CSK_SHA256}

DIBasicType

DIBasicType nodes represent primitive types, such asint,bool andfloat.tag: defaults toDW_TAG_base_type.

!0 = !DIBasicType(name: "unsigned char", size: 8, align: 8,                  encoding: DW_ATE_unsigned_char)!1 = !DIBasicType(tag: DW_TAG_unspecified_type, name: "decltype(nullptr)")

Theencoding: describes the details of the type. Usually it’s one of thefollowing:

DW_ATE_address       = 1DW_ATE_boolean       = 2DW_ATE_float         = 4DW_ATE_signed        = 5DW_ATE_signed_char   = 6DW_ATE_unsigned      = 7DW_ATE_unsigned_char = 8
DIFixedPointType

DIFixedPointType nodes represent fixed-point types. A fixed-pointtype is conceptually an integer with a scale factor.DIFixedPointType is derived fromDIBasicType and inherits itsattributes. However, only certain encodings are accepted:

DW_ATE_signed_fixed   = 13DW_ATE_unsigned_fixed = 14

There are three kinds of fixed-point type: binary, where the scalefactor is a power of 2; decimal, where the scale factor is a power of10; and rational, where the scale factor is an arbitrary rationalnumber.

!0 = !DIFixedPointType(name: "decimal", size: 8, encoding: DW_ATE_signed_fixed,                       kind: Decimal, factor: -4)!1 = !DIFixedPointType(name: "binary", size: 8, encoding: DW_ATE_unsigned_fixed,                       kind: Binary, factor: -16)!2 = !DIFixedPointType(name: "rational", size: 8, encoding: DW_ATE_signed_fixed,                       kind: Rational, numerator: 1234, denominator: 5678)
DISubroutineType

DISubroutineType nodes represent subroutine types. Theirtypes: fieldrefers to a tuple; the first operand is the return type, while the rest are thetypes of the formal arguments in order. If the first operand isnull, thatrepresents a function with no return value (such asvoidfoo(){} in C++).

!0 = !BasicType(name: "int", size: 32, align: 32, DW_ATE_signed)!1 = !BasicType(name: "char", size: 8, align: 8, DW_ATE_signed_char)!2 = !DISubroutineType(types: !{null, !0, !1}) ; void (int, char)
DIDerivedType

DIDerivedType nodes represent types derived from other types, such asqualified types.

!0 = !DIBasicType(name: "unsigned char", size: 8, align: 8,                  encoding: DW_ATE_unsigned_char)!1 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !0, size: 32,                    align: 32)

The followingtag: values are valid:

DW_TAG_member             = 13DW_TAG_pointer_type       = 15DW_TAG_reference_type     = 16DW_TAG_typedef            = 22DW_TAG_inheritance        = 28DW_TAG_ptr_to_member_type = 31DW_TAG_const_type         = 38DW_TAG_friend             = 42DW_TAG_volatile_type      = 53DW_TAG_restrict_type      = 55DW_TAG_atomic_type        = 71DW_TAG_immutable_type     = 75

DW_TAG_member is used to define a member of acomposite type. The type of the member is thebaseType:. Theoffset: is the member’s bit offset. If the composite type has an ODRidentifier: and does not setflags:DIFwdDecl, then the member isuniqued based only on itsname: andscope:.

DW_TAG_inheritance andDW_TAG_friend are used in theelements:field ofcomposite types to describe parents andfriends.

DW_TAG_typedef is used to provide a name for thebaseType:.

DW_TAG_pointer_type,DW_TAG_reference_type,DW_TAG_const_type,DW_TAG_volatile_type,DW_TAG_restrict_type,DW_TAG_atomic_type andDW_TAG_immutable_type are used to qualify thebaseType:.

Note that thevoid* type is expressed as a type derived from NULL.

DICompositeType

DICompositeType nodes represent types composed of other types, likestructures and unions.elements: points to a tuple of the composed types.

If the source language supports ODR, theidentifier: field gives the uniqueidentifier used for type merging between modules. When specified,subprogram declarations andmemberderived types that reference the ODR-type in theirscope: change uniquing rules.

For a givenidentifier:, there should only be a single composite type thatdoes not haveflags:DIFlagFwdDecl set. LLVM tools that link modulestogether will unique such definitions at parse time via theidentifier:field, even if the nodes aredistinct.

!0 = !DIEnumerator(name: "SixKind", value: 7)!1 = !DIEnumerator(name: "SevenKind", value: 7)!2 = !DIEnumerator(name: "NegEightKind", value: -8)!3 = !DICompositeType(tag: DW_TAG_enumeration_type, name: "Enum", file: !12,                      line: 2, size: 32, align: 32, identifier: "_M4Enum",                      elements: !{!0, !1, !2})

The followingtag: values are valid:

DW_TAG_array_type       = 1DW_TAG_class_type       = 2DW_TAG_enumeration_type = 4DW_TAG_structure_type   = 19DW_TAG_union_type       = 23DW_TAG_variant          = 25DW_TAG_variant_part     = 51

ForDW_TAG_array_type, theelements: should besubrangedescriptors orsubrange descriptors, each representing the range of subscripts at thatlevel of indexing. TheDIFlagVector flag toflags: indicatesthat an array type is a native packed vector. The optionaldataLocation is a DIExpression that describes how to get from anobject’s address to the actual raw data, if they aren’tequivalent. This is only supported for array types, particularly todescribe Fortran arrays, which have an array descriptor in addition tothe array data. Alternatively it can also be DIVariable which has theaddress of the actual raw data. The Fortran language supports pointerarrays which can be attached to actual arrays, this attachment betweenpointer and pointee is called association. The optionalassociated is a DIExpression that describes whether the pointerarray is currently associated. The optionalallocated is aDIExpression that describes whether the allocatable array is currentlyallocated. The optionalrank is a DIExpression that describes therank (number of dimensions) of fortran assumed rank array (rank isknown at runtime). The optionalbitStride is an unsigned constantthat describes the number of bits occupied by an element of the array;this is only needed if it differs from the element type’s naturalsize, and is normally used for packed arrays.

ForDW_TAG_enumeration_type, theelements: should beenumeratordescriptors, each representing the definition of an enumerationvalue for the set. All enumeration type descriptors are collected in theenums: field of thecompile unit.

ForDW_TAG_structure_type,DW_TAG_class_type, andDW_TAG_union_type, theelements: should bederived types withtag:DW_TAG_member,tag:DW_TAG_inheritance, ortag:DW_TAG_friend; orsubprograms withisDefinition:false.

DW_TAG_variant_part introduces a variant part of a structure type.This should have a discriminant, a member that is used to decide whichelements are active. The elements of the variant part should each beaDW_TAG_member; if a member has a non-nullExtraData, then itis aConstantInt orConstantDataArray indicating the values ofthe discriminant member that cause the activation of this branch. Amember itself may be of composite type with tagDW_TAG_variant; inthis case the members of that composite type are inlined into thecurrent one.

DISubrange

DISubrange nodes are the elements forDW_TAG_array_type variants ofDICompositeType.

!0 = !DISubrange(count: 5, lowerBound: 0) ; array counting from 0!1 = !DISubrange(count: 5, lowerBound: 1) ; array counting from 1!2 = !DISubrange(count: -1) ; empty array.; Scopes used in rest of example!6 = !DIFile(filename: "vla.c", directory: "/path/to/file")!7 = distinct !DICompileUnit(language: DW_LANG_C99, file: !6)!8 = distinct !DISubprogram(name: "foo", scope: !7, file: !6, line: 5); Use of local variable as count value!9 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)!10 = !DILocalVariable(name: "count", scope: !8, file: !6, line: 42, type: !9)!11 = !DISubrange(count: !10, lowerBound: 0); Use of global variable as count value!12 = !DIGlobalVariable(name: "count", scope: !8, file: !6, line: 22, type: !9)!13 = !DISubrange(count: !12, lowerBound: 0)
DISubrangeType

DISubrangeType is similar toDISubrange, but it is also aDIType. It may be used as the type of an object, but could alsobe used as an array index.

LikeDISubrange, it can hold a lower bound and count, or a lowerbound and upper bound. ADISubrangeType refers to the underlyingtype of which it is a subrange; this type can be an integer type or anenumeration type.

ADISubrangeType may also have a stride – unlikeDISubrange,this stride is a bit stride. The stride is only useful when aDISubrangeType is used as an array index type.

Finally,DISubrangeType may have a bias. In Ada, a program canrequest that a subrange value be stored in the minimum number of bitsrequired. In this situation, the stored value is biased by the lowerbound – e.g., a range-7..0 may take 3 bits in memory, and thevalue -5 would be stored as 2 (a bias of -7).

; Scopes used in rest of example!0 = !DIFile(filename: "vla.c", directory: "/path/to/file")!1 = distinct !DICompileUnit(language: DW_LANG_C99, file: !0)!2 = distinct !DISubprogram(name: "foo", scope: !1, file: !0, line: 5); Base type used in example.!3 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed); A simple subrange with a name.!4 = !DISubrange(name: "subrange", file: !0, line: 17, size: 32,                 align: 32, baseType: !3, lowerBound: 18, count: 12); A subrange with a bias.!5 = !DISubrange(name: "biased", lowerBound: -7, upperBound: 0,                 bias: -7, size: 3); A subrange with a bit stride.!6 = !DISubrange(name: "biased", lowerBound: 0, upperBound: 7,                 stride: 3)
DIEnumerator

DIEnumerator nodes are the elements forDW_TAG_enumeration_typevariants ofDICompositeType.

!0 = !DIEnumerator(name: "SixKind", value: 7)!1 = !DIEnumerator(name: "SevenKind", value: 7)!2 = !DIEnumerator(name: "NegEightKind", value: -8)
DITemplateTypeParameter

DITemplateTypeParameter nodes represent type parameters to generic sourcelanguage constructs. They are used (optionally) inDICompositeType andDISubprogramtemplateParams: fields.

!0 = !DITemplateTypeParameter(name: "Ty", type: !1)
DITemplateValueParameter

DITemplateValueParameter nodes represent value parameters to generic sourcelanguage constructs.tag: defaults toDW_TAG_template_value_parameter,but if specified can also be set toDW_TAG_GNU_template_template_param orDW_TAG_GNU_template_param_pack. They are used (optionally) inDICompositeType andDISubprogramtemplateParams: fields.

!0 = !DITemplateValueParameter(name: "Ty", type: !1, value: i32 7)
DINamespace

DINamespace nodes represent namespaces in the source language.

!0 = !DINamespace(name: "myawesomeproject", scope: !1, file: !2, line: 7)
DIGlobalVariable

DIGlobalVariable nodes represent global variables in the source language.

@foo = global i32, !dbg !0!0 = !DIGlobalVariableExpression(var: !1, expr: !DIExpression())!1 = !DIGlobalVariable(name: "foo", linkageName: "foo", scope: !2,                       file: !3, line: 7, type: !4, isLocal: true,                       isDefinition: false, declaration: !5)
DIGlobalVariableExpression

DIGlobalVariableExpression nodes tie aDIGlobalVariable togetherwith aDIExpression.

@lower = global i32, !dbg !0@upper = global i32, !dbg !1!0 = !DIGlobalVariableExpression(         var: !2,         expr: !DIExpression(DW_OP_LLVM_fragment, 0, 32)         )!1 = !DIGlobalVariableExpression(         var: !2,         expr: !DIExpression(DW_OP_LLVM_fragment, 32, 32)         )!2 = !DIGlobalVariable(name: "split64", linkageName: "split64", scope: !3,                       file: !4, line: 8, type: !5, declaration: !6)

All global variable expressions should be referenced by theglobals: field ofacompile unit.

DISubprogram

DISubprogram nodes represent functions from the source language. A distinctDISubprogram may be attached to a function definition using!dbgmetadata. A uniqueDISubprogram may be attached to a function declarationused for call site debug info. TheretainedNodes: field is a list ofvariables andlabels that must beretained, even if their IR counterparts are optimized out of the IR. Thetype: field must point at anDISubroutineType.

WhenspFlags:DISPFlagDefinition is not present, subprograms describe adeclaration in the type tree as opposed to a definition of a function. In thiscase, thedeclaration field must be empty. If the scope is a composite typewith an ODRidentifier: and that does not setflags:DIFwdDecl, thenthe subprogram declaration is uniqued based only on itslinkageName: andscope:.

define void @_Z3foov() !dbg !0 {  ...}!0 = distinct !DISubprogram(name: "foo", linkageName: "_Zfoov", scope: !1,                            file: !2, line: 7, type: !3,                            spFlags: DISPFlagDefinition | DISPFlagLocalToUnit,                            scopeLine: 8, containingType: !4,                            virtuality: DW_VIRTUALITY_pure_virtual,                            virtualIndex: 10, flags: DIFlagPrototyped,                            isOptimized: true, unit: !5, templateParams: !6,                            declaration: !7, retainedNodes: !8,                            thrownTypes: !9)
DILexicalBlock

DILexicalBlock nodes describe nested blocks within asubprogram. The line number and column numbers are used to distinguishtwo lexical blocks at same depth. They are valid targets forscope:fields.

!0 = distinct !DILexicalBlock(scope: !1, file: !2, line: 7, column: 35)

Usually lexical blocks aredistinct to prevent node merging based onoperands.

DILexicalBlockFile

DILexicalBlockFile nodes are used to discriminate between sections of alexical block. Thefile: field can be changed toindicate textual inclusion, or thediscriminator: field can be used todiscriminate between control flow within a single block in the source language.

!0 = !DILexicalBlock(scope: !3, file: !4, line: 7, column: 35)!1 = !DILexicalBlockFile(scope: !0, file: !4, discriminator: 0)!2 = !DILexicalBlockFile(scope: !0, file: !4, discriminator: 1)
DILocation

DILocation nodes represent source debug locations. Thescope: field ismandatory, and points at anDILexicalBlockFile, anDILexicalBlock, or anDISubprogram.

!0 = !DILocation(line: 2900, column: 42, scope: !1, inlinedAt: !2)
DILocalVariable

DILocalVariable nodes represent local variables in the source language. Ifthearg: field is set to non-zero, then this variable is a subprogramparameter, and it will be included in theretainedNodes: field of itsDISubprogram.

!0 = !DILocalVariable(name: "this", arg: 1, scope: !3, file: !2, line: 7,                      type: !3, flags: DIFlagArtificial)!1 = !DILocalVariable(name: "x", arg: 2, scope: !4, file: !2, line: 7,                      type: !3)!2 = !DILocalVariable(name: "y", scope: !5, file: !2, line: 7, type: !3)
DIExpression

DIExpression nodes represent expressions that are inspired by the DWARFexpression language. They are used indebug records(such as#dbg_declare and#dbg_value) to describe how thereferenced LLVM variable relates to the source language variable. Debugexpressions are interpreted left-to-right: start by pushing the value/addressoperand of the record onto a stack, then repeatedly push and evaluateopcodes from the DIExpression until the final variable description is produced.

The current supported opcode vocabulary is limited:

  • DW_OP_deref dereferences the top of the expression stack.

  • DW_OP_plus pops the last two entries from the expression stack, addsthem together and appends the result to the expression stack.

  • DW_OP_minus pops the last two entries from the expression stack, subtractsthe last entry from the second last entry and appends the result to theexpression stack.

  • DW_OP_plus_uconst,93 adds93 to the working expression.

  • DW_OP_LLVM_fragment,16,8 specifies the offset and size (16 and8here, respectively) of the variable fragment from the working expression. Notethat contrary to DW_OP_bit_piece, the offset is describing the locationwithin the described source variable.

  • DW_OP_LLVM_convert,16,DW_ATE_signed specifies a bit size and encoding(16 andDW_ATE_signed here, respectively) to which the top of theexpression stack is to be converted. Maps into aDW_OP_convert operationthat references a base type constructed from the supplied values.

  • DW_OP_LLVM_extract_bits_sext,16,8, specifies the offset and size(16 and8 here, respectively) of bits that are to be extracted andsign-extended from the value at the top of the expression stack. If the top ofthe expression stack is a memory location then these bits are extracted fromthe value pointed to by that memory location. Maps into aDW_OP_shlfollowed byDW_OP_shra.

  • DW_OP_LLVM_extract_bits_zext behaves similarly toDW_OP_LLVM_extract_bits_sext, but zero-extends instead of sign-extending.Maps into aDW_OP_shl followed byDW_OP_shr.

  • DW_OP_LLVM_tag_offset,tag_offset specifies that a memory tag should beoptionally applied to the pointer. The memory tag is derived from thegiven tag offset in an implementation-defined manner.

  • DW_OP_swap swaps top two stack entries.

  • DW_OP_xderef provides extended dereference mechanism. The entry at the topof the stack is treated as an address. The second stack entry is treated as anaddress space identifier.

  • DW_OP_stack_value marks a constant value.

  • DW_OP_LLVM_entry_value,N refers to the value a register had uponfunction entry. When targeting DWARF, aDBG_VALUE(reg,...,DIExpression(DW_OP_LLVM_entry_value,1,...) is lowered toDW_OP_entry_value[reg],..., which pushes the valuereg had uponfunction entry onto the DWARF expression stack.

    The next(N-1) operations will be part of theDW_OP_entry_valueblock argument. For example,!DIExpression(DW_OP_LLVM_entry_value,1,DW_OP_plus_uconst,123,DW_OP_stack_value) specifies an expression wherethe entry value ofreg is pushed onto the stack, and is added with 123.Due to framework limitationsN must be 1, in other words,DW_OP_entry_value always refers to the value/address operand of theinstruction.

    BecauseDW_OP_LLVM_entry_value is defined in terms of registers, it isusually used in MIR, but it is also allowed in LLVM IR when targeting aswiftasync argument. The operation is introduced by:

    • LiveDebugValues pass, which applies it to function parameters thatare unmodified throughout the function. Support is limited to simpleregister location descriptions, or as indirect locations (e.g.,parameters passed-by-value to a callee via a pointer to a temporary copymade in the caller).

    • AsmPrinter pass when a call site parameter value(DW_AT_call_site_parameter_value) is represented as entry value ofthe parameter.

    • CoroSplit pass, which may move variables from allocas into acoroutine frame. If the coroutine frame is aswiftasync argument, the variable is described withanDW_OP_LLVM_entry_value operation.

  • DW_OP_LLVM_arg,N is used in debug intrinsics that refer to more than onevalue, such as one that calculates the sum of two registers. This is alwaysused in combination with an ordered list of values, such thatDW_OP_LLVM_arg,N refers to theNth element in that list. Forexample,!DIExpression(DW_OP_LLVM_arg,0,DW_OP_LLVM_arg,1,DW_OP_minus,DW_OP_stack_value) used with the list(%reg1,%reg2) would evaluate to%reg1-reg2. This list of values should be provided by the containingintrinsic/instruction.

  • DW_OP_breg (orDW_OP_bregx) represents a content on the providedsigned offset of the specified register. The opcode is only generated by theAsmPrinter pass to describe call site parameter value which requires anexpression over two registers.

  • DW_OP_push_object_address pushes the address of the object which can thenserve as a descriptor in subsequent calculation. This opcode can be used tocalculate bounds of fortran allocatable array which has array descriptors.

  • DW_OP_over duplicates the entry currently second in the stack at the topof the stack. This opcode can be used to calculate bounds of fortran assumedrank array which has rank known at run time and current dimension number isimplicitly first element of the stack.

  • DW_OP_LLVM_implicit_pointer It specifies the dereferenced value. It canbe used to represent pointer variables which are optimized out but the valueit points to is known. This operator is required as it is different than DWARFoperator DW_OP_implicit_pointer in representation and specification (numberand types of operands) and later can not be used as multiple level.

IR for "*ptr = 4;"--------------  #dbg_value(i32 4, !17, !DIExpression(DW_OP_LLVM_implicit_pointer), !20)!17 = !DILocalVariable(name: "ptr1", scope: !12, file: !3, line: 5,                       type: !18)!18 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !19, size: 64)!19 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)!20 = !DILocation(line: 10, scope: !12)IR for "**ptr = 4;"--------------  #dbg_value(i32 4, !17,    !DIExpression(DW_OP_LLVM_implicit_pointer, DW_OP_LLVM_implicit_pointer),    !21)!17 = !DILocalVariable(name: "ptr1", scope: !12, file: !3, line: 5,                       type: !18)!18 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !19, size: 64)!19 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !20, size: 64)!20 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)!21 = !DILocation(line: 10, scope: !12)

DWARF specifies three kinds of simple location descriptions: Register, memory,and implicit location descriptions. Note that a location description isdefined over certain ranges of a program, i.e the location of a variable maychange over the course of the program. Register and memory locationdescriptions describe theconcrete location of a source variable (in thesense that a debugger might modify its value), whereasimplicit locationsdescribe merely the actualvalue of a source variable which might not existin registers or in memory (seeDW_OP_stack_value).

A#dbg_declare record describes an indirect value (the address) of asource variable. The first operand of the record must be an address of somekind. A DIExpression operand to the record refines this address to produce aconcrete location for the source variable.

A#dbg_value record describes the direct value of a source variable.The first operand of the record may be a direct or indirect value. ADIExpression operand to the record refines the first operand to produce adirect value. For example, if the first operand is an indirect value, it may benecessary to insertDW_OP_deref into the DIExpression in order to produce avalid debug record.

Note

A DIExpression is interpreted in the same way regardless of which kind ofdebug record it’s attached to.

DIExpressions are always printed and parsed inline; they can never bereferenced by an ID (e.g.!1).

!DIExpression(DW_OP_deref)!DIExpression(DW_OP_plus_uconst, 3)!DIExpression(DW_OP_constu, 3, DW_OP_plus)!DIExpression(DW_OP_bit_piece, 3, 7)!DIExpression(DW_OP_deref, DW_OP_constu, 3, DW_OP_plus, DW_OP_LLVM_fragment, 3, 7)!DIExpression(DW_OP_constu, 2, DW_OP_swap, DW_OP_xderef)!DIExpression(DW_OP_constu, 42, DW_OP_stack_value)
DIAssignID

DIAssignID nodes have no operands and are always distinct. They are used tolink together (#dbg_assign records) and instructionsthat store in IR. SeeDebug Info Assignment Tracking for more info.

storei32%a,ptr%a.addr,align4,!DIAssignID!2#dbg_assign(%a,!1,!DIExpression(),!2,%a.addr,!DIExpression(),!3)!2=distinct!DIAssignID()
DIArgList

DIArgList nodes hold a list of constant or SSA value references. These areused indebug records in combination with aDIExpression that uses theDW_OP_LLVM_arg operator. Because a DIArgList may refer to local valueswithin a function, it must only be used as a function argument, must always beinlined, and cannot appear in named metadata.

#dbg_value(!DIArgList(i32 %a, i32 %b),           !16,           !DIExpression(DW_OP_LLVM_arg, 0, DW_OP_LLVM_arg, 1, DW_OP_plus),           !26)
DIFlags

These flags encode various properties of DINodes.

TheExportSymbols flag marks a class, struct or union whose membersmay be referenced as if they were defined in the containing class orunion. This flag is used to decide whether the DW_AT_export_symbols canbe used for the structure type.

DIObjCProperty

DIObjCProperty nodes represent Objective-C property nodes.

!3 = !DIObjCProperty(name: "foo", file: !1, line: 7, setter: "setFoo",                     getter: "getFoo", attributes: 7, type: !2)
DIImportedEntity

DIImportedEntity nodes represent entities (such as modules) imported into acompile unit. Theelements field is a list of renamed entities (such asvariables and subprograms) in the imported entity (such as module).

!2 = !DIImportedEntity(tag: DW_TAG_imported_module, name: "foo", scope: !0,                       entity: !1, line: 7, elements: !3)!3 = !{!4}!4 = !DIImportedEntity(tag: DW_TAG_imported_declaration, name: "bar", scope: !0,                       entity: !5, line: 7)
DIMacro

DIMacro nodes represent definition or undefinition of a macro identifiers.Thename: field is the macro identifier, followed by macro parameters whendefining a function-like macro, and thevalue field is the token-stringused to expand the macro identifier.

!2 = !DIMacro(macinfo: DW_MACINFO_define, line: 7, name: "foo(x)",              value: "((x) + 1)")!3 = !DIMacro(macinfo: DW_MACINFO_undef, line: 30, name: "foo")
DIMacroFile

DIMacroFile nodes represent inclusion of source files.Thenodes: field is a list ofDIMacro andDIMacroFile nodes thatappear in the included source file.

!2 = !DIMacroFile(macinfo: DW_MACINFO_start_file, line: 7, file: !2,                  nodes: !3)
DILabel

DILabel nodes represent labels within aDISubprogram. Thescope:field must be one of either aDILexicalBlockFile, aDILexicalBlock, or aDISubprogram. Thename: field is thelabel identifier. Thefile: field is theDIFile the label ispresent in. Theline: andcolumn: field are the source line and columnwithin the file where the label is declared.

Furthermore, a label can be marked as artificial, i.e. compiler-generated,usingisArtificial:. Such artificial labels are generated, e.g., bytheCoroSplit pass. In addition, theCoroSplit pass also uses thecoroSuspendIdx: field to identify the coroutine suspend points.

scope:,name:,file: andline: are mandatory. The remainingfields are optional.

!2 = !DILabel(scope: !0, name: "foo", file: !1, line: 7, column: 4)!3 = !DILabel(scope: !0, name: "__coro_resume_3", file: !1, line: 9, column: 3, isArtificial: true, coroSuspendIdx: 3)
DICommonBlock

DICommonBlock nodes represent Fortran common blocks. Thescope: fieldis mandatory and points to aDILexicalBlockFile, aDILexicalBlock, or aDISubprogram. Thedeclaration:,name:,file:, andline: fields are optional.

DIModule

DIModule nodes represent a source language module, for example, a Clangmodule, or a Fortran module. Thescope: field is mandatory and points to aDILexicalBlockFile, aDILexicalBlock, or aDISubprogram.Thename: field is mandatory. TheconfigMacros:,includePath:,apinotes:,file:,line:, andisDecl: fields are optional.

DIStringType

DIStringType nodes represent a FortranCHARACTER(n) type, with adynamic length and location encoded as an expression.Thetag: field is optional and defaults toDW_TAG_string_type. Thename:,stringLength:,stringLengthExpression,stringLocationExpression:,size:,align:, andencoding: fields are optional.

If not present, thesize: andalign: fields default to the value zero.

The length in bits of the string is specified by the first of the followingfields present:

  • stringLength:, which points to aDIVariable whose value is the stringlength in bits.

  • stringLengthExpression:, which points to aDIExpression whichcomputes the length in bits.

  • size, which contains the literal length in bits.

ThestringLocationExpression: points to aDIExpression which describesthe “data location” of the string object, if present.

tbaa’ Metadata

In LLVM IR, memory does not have types, so LLVM’s own type system is notsuitable for doing type based alias analysis (TBAA). Instead, metadata isadded to the IR to describe a type system of a higher level language. Thiscan be used to implement C/C++ strict type aliasing rules, but it can alsobe used to implement custom alias analysis behavior for other languages.

This description of LLVM’s TBAA system is broken into two parts:Semantics talks about high level issues, andRepresentation talks about the metadataencoding of various entities.

It is always possible to trace any TBAA node to a “root” TBAA node (detailsin theRepresentation section). TBAAnodes with different roots have an unknown aliasing relationship, and LLVMconservatively infersMayAlias between them. The rules mentioned inthis section only pertain to TBAA nodes living under the same root.

Semantics

The TBAA metadata system, referred to as “struct path TBAA” (not to beconfused withtbaa.struct), consists of the following high levelconcepts:Type Descriptors, further subdivided into scalar typedescriptors and struct type descriptors; andAccess Tags.

Type descriptors describe the type system of the higher level languagebeing compiled.Scalar type descriptors describe types that do notcontain other types. Each scalar type has a parent type, which must alsobe a scalar type or the TBAA root. Via this parent relation, scalar typeswithin a TBAA root form a tree.Struct type descriptors denote typesthat contain a sequence of other type descriptors, at known offsets. Thesecontained type descriptors can either be struct type descriptors themselvesor scalar type descriptors.

Access tags are metadata nodes attached to load and store instructions.Access tags use type descriptors to describe thelocation being accessedin terms of the type system of the higher level language. Access tags aretuples consisting of a base type, an access type and an offset. The basetype is a scalar type descriptor or a struct type descriptor, the accesstype is a scalar type descriptor, and the offset is a constant integer.

The access tag(BaseTy,AccessTy,Offset) can describe one of twothings:

  • IfBaseTy is a struct type, the tag describes a memory access (loador store) of a value of typeAccessTy contained in the struct typeBaseTy at offsetOffset.

  • IfBaseTy is a scalar type,Offset must be 0 andBaseTy andAccessTy must be the same; and the access tag describes a scalaraccess with scalar typeAccessTy.

We first define anImmediateParent relation on(BaseTy,Offset)tuples this way:

  • IfBaseTy is a scalar type thenImmediateParent(BaseTy,0) is(ParentTy,0) whereParentTy is the parent of the scalar type asdescribed in the TBAA metadata.ImmediateParent(BaseTy,Offset) isundefined ifOffset is non-zero.

  • IfBaseTy is a struct type thenImmediateParent(BaseTy,Offset)is(NewTy,NewOffset) whereNewTy is the type contained inBaseTy at offsetOffset andNewOffset isOffset adjustedto be relative within that inner type.

A memory access with an access tag(BaseTy1,AccessTy1,Offset1)aliases a memory access with an access tag(BaseTy2,AccessTy2,Offset2) if either(BaseTy1,Offset1) is reachable from(Base2,Offset2) via theParent relation or vice versa. If memory accessesalias even though they are noalias according to!tbaa metadata, thebehavior is undefined.

As a concrete example, the type descriptor graph for the following program

structInner{inti;// offset 0floatf;// offset 4};structOuter{floatf;// offset 0doubled;// offset 4structInnerinner_a;// offset 12};voidf(structOuter*outer,structInner*inner,float*f,int*i,char*c){outer->f=0;// tag0: (OuterStructTy, FloatScalarTy, 0)outer->inner_a.i=0;// tag1: (OuterStructTy, IntScalarTy, 12)outer->inner_a.f=0.0;// tag2: (OuterStructTy, FloatScalarTy, 16)*f=0.0;// tag3: (FloatScalarTy, FloatScalarTy, 0)}

is (note that in C and C++,char can be used to access any arbitrarytype):

Root = "TBAA Root"CharScalarTy = ("char", Root, 0)FloatScalarTy = ("float", CharScalarTy, 0)DoubleScalarTy = ("double", CharScalarTy, 0)IntScalarTy = ("int", CharScalarTy, 0)InnerStructTy = {"Inner" (IntScalarTy, 0), (FloatScalarTy, 4)}OuterStructTy = {"Outer", (FloatScalarTy, 0), (DoubleScalarTy, 4),                 (InnerStructTy, 12)}

with (e.g.)ImmediateParent(OuterStructTy,12) =(InnerStructTy,0),ImmediateParent(InnerStructTy,0) =(IntScalarTy,0), andImmediateParent(IntScalarTy,0) =(CharScalarTy,0).

Representation

The root node of a TBAA type hierarchy is anMDNode with 0 operands orwith exactly oneMDString operand.

Scalar type descriptors are represented as anMDNode s with twooperands. The first operand is anMDString denoting the name of thestruct type. LLVM does not assign meaning to the value of this operand, itonly cares about it being anMDString. The second operand is anMDNode which points to the parent for said scalar type descriptor,which is either another scalar type descriptor or the TBAA root. Scalartype descriptors can have an optional third argument, but that must be theconstant integer zero.

Struct type descriptors are represented asMDNode s with an odd numberof operands greater than 1. The first operand is anMDString denotingthe name of the struct type. Like in scalar type descriptors the actualvalue of this name operand is irrelevant to LLVM. After the name operand,the struct type descriptors have a sequence of alternatingMDNode andConstantInt operands. With N starting from 1, the 2N - 1 th operand,anMDNode, denotes a contained field, and the 2N th operand, aConstantInt, is the offset of the said contained field. The offsetsmust be in non-decreasing order.

Access tags are represented asMDNode s with either 3 or 4 operands.The first operand is anMDNode pointing to the node representing thebase type. The second operand is anMDNode pointing to the noderepresenting the access type. The third operand is aConstantInt thatstates the offset of the access. If a fourth field is present, it must beaConstantInt valued at 0 or 1. If it is 1 then the access tag statesthat the location being accessed is “constant” (meaningpointsToConstantMemory should return true; seeother usefulAliasAnalysis methods). The TBAA root ofthe access type and the base type of an access tag must be the same, andthat is the TBAA root of the access tag.

tbaa.struct’ Metadata

Thellvm.memcpy is often used to implementaggregate assignment operations in C and similar languages, however itis defined to copy a contiguous region of memory, which is more thanstrictly necessary for aggregate types which contain holes due topadding. Also, it doesn’t contain any TBAA information about the fieldsof the aggregate.

!tbaa.struct metadata can describe which memory subregions in amemcpy are padding and what the TBAA tags of the struct are.

The current metadata format is very simple.!tbaa.struct metadatanodes are a list of operands which are in conceptual groups of three.For each group of three, the first operand gives the byte offset of afield in bytes, the second gives its size in bytes, and the third givesits tbaa tag. e.g.:

!4=!{i640,i644,!1,i648,i644,!2}

This describes a struct with two fields. The first is at offset 0 byteswith size 4 bytes, and has tbaa tag !1. The second is at offset 8 bytesand has size 4 bytes and has tbaa tag !2.

Note that the fields need not be contiguous. In this example, there is a4 byte gap between the two fields. This gap represents padding whichdoes not carry useful data and need not be preserved.

noalias’ and ‘alias.scope’ Metadata

noalias andalias.scope metadata provide the ability to specify genericnoalias memory-access sets. This means that some collection of memory accessinstructions (loads, stores, memory-accessing calls, etc.) that carrynoalias metadata can specifically be specified not to alias with some othercollection of memory access instructions that carryalias.scope metadata. Ifaccesses from different collections alias, the behavior is undefined. Each typeof metadata specifies a list of scopes where each scope has an id and a domain.

When evaluating an aliasing query, if for some domain, the setof scopes with that domain in one instruction’salias.scope list is asubset of (or equal to) the set of scopes for that domain in anotherinstruction’snoalias list, then the two memory accesses are assumed not toalias.

Because scopes in one domain don’t affect scopes in other domains, separatedomains can be used to compose multiple independent noalias sets. This isused for example during inlining. As the noalias function parameters areturned into noalias scope metadata, a new domain is used every time thefunction is inlined.

The metadata identifying each domain is itself a list containing one or twoentries. The first entry is the name of the domain. Note that if the name is astring then it can be combined across functions and translation units. Aself-reference can be used to create globally unique domain names. Adescriptive string may optionally be provided as a second list entry.

The metadata identifying each scope is also itself a list containing two orthree entries. The first entry is the name of the scope. Note that if the nameis a string then it can be combined across functions and translation units. Aself-reference can be used to create globally unique scope names. A metadatareference to the scope’s domain is the second entry. A descriptive string mayoptionally be provided as a third list entry.

For example,

; Two scope domains:!0=!{!0}!1=!{!1}; Some scopes in these domains:!2=!{!2,!0}!3=!{!3,!0}!4=!{!4,!1}; Some scope lists:!5=!{!4}; A list containing only scope !4!6=!{!4,!3,!2}!7=!{!3}; These two instructions don't alias:%0=loadfloat,ptr%c,align4,!alias.scope!5storefloat%0,ptr%arrayidx.i,align4,!noalias!5; These two instructions also don't alias (for domain !1, the set of scopes; in the !alias.scope equals that in the !noalias list):%2=loadfloat,ptr%c,align4,!alias.scope!5storefloat%2,ptr%arrayidx.i2,align4,!noalias!6; These two instructions may alias (for domain !0, the set of scopes in; the !noalias list is not a superset of, or equal to, the scopes in the; !alias.scope list):%2=loadfloat,ptr%c,align4,!alias.scope!6storefloat%0,ptr%arrayidx.i,align4,!noalias!7

fpmath’ Metadata

fpmath metadata may be attached to any instruction of floating-pointtype. It can be used to express the maximum acceptable error in theresult of that instruction, in ULPs, thus potentially allowing thecompiler to use a more efficient but less accurate method of computingit. ULP is defined as follows:

Ifx is a real number that lies between two finite consecutivefloating-point numbersa andb, without being equal to oneof them, thenulp(x)=|b-a|, otherwiseulp(x) is thedistance between the two non-equal finite floating-point numbersnearestx. Moreover,ulp(NaN) isNaN.

The metadata node shall consist of a single positive float type numberrepresenting the maximum relative error, for example:

!0=!{float2.5}; maximum acceptable inaccuracy is 2.5 ULPs

range’ Metadata

range metadata may be attached only toload,call andinvoke ofinteger or vector of integer types. It expresses the possible ranges the loadedvalue or the value returned by the called function at this call site is in. Ifthe loaded or returned value is not in the specified range, a poison value isreturned instead. The ranges are represented with a flattened list of integers.The loaded value or the value returned is known to be in the union of the rangesdefined by each consecutive pair. Each pair has the following properties:

  • The type must match the scalar type of the instruction.

  • The paira,b represents the range[a,b).

  • Botha andb are constants.

  • The range is allowed to wrap.

  • The range should not represent the full or empty set. That is,a!=b.

In addition, the pairs must be in signed order of the lower bound andthey must be non-contiguous.

For vector-typed instructions, the range is applied element-wise.

Examples:

%a=loadi8,ptr%x,align1,!range!0; Can only be 0 or 1%b=loadi8,ptr%y,align1,!range!1; Can only be 255 (-1), 0 or 1%c=calli8@foo(),!range!2; Can only be 0, 1, 3, 4 or 5%d=invokei8@bar()tolabel%contunwindlabel%lpad,!range!3; Can only be -2, -1, 3, 4 or 5%e=load<2xi8>,ptr%x,!range0; Can only be <0 or 1, 0 or 1>...!0=!{i80,i82}!1=!{i8255,i82}!2=!{i80,i82,i83,i86}!3=!{i8-2,i80,i83,i86}

absolute_symbol’ Metadata

absolute_symbol metadata may be attached to a global variabledeclaration. It marks the declaration as a reference to an absolute symbol,which causes the backend to use absolute relocations for the symbol evenin position independent code, and expresses the possible ranges that theglobal variable’saddress (not its value) is in, in the same format asrange metadata, with the extension that the pairall-ones,all-onesmay be used to represent the full set.

Example (assuming 64-bit pointers):

@a=externalglobali8,!absolute_symbol!0; Absolute symbol in range [0,256)@b=externalglobali8,!absolute_symbol!1; Absolute symbol in range [0,2^64)...!0=!{i640,i64256}!1=!{i64-1,i64-1}

callees’ Metadata

callees metadata may be attached to indirect call sites. Ifcalleesmetadata is attached to a call site, and any callee is not among the set offunctions provided by the metadata, the behavior is undefined. The intent ofthis metadata is to facilitate optimizations such as indirect-call promotion.For example, in the code below, the call instruction may only target theadd orsub functions:

%result=calli64%binop(i64%x,i64%y),!callees!0...!0=!{ptr@add,ptr@sub}

callback’ Metadata

callback metadata may be attached to a function declaration, or definition.(Call sites are excluded only due to the lack of a use case.) For ease ofexposition, we’ll refer to the function annotated w/ metadata as a brokerfunction. The metadata describes how the arguments of a call to the broker arein turn passed to the callback function specified by the metadata. Thus, thecallback metadata provides a partial description of a call site inside thebroker function with regards to the arguments of a call to the broker. The onlysemantic restriction on the broker function itself is that it is not allowed toinspect or modify arguments referenced in thecallback metadata aspass-through to the callback function.

The broker is not required to actually invoke the callback function at runtime.However, the assumptions about not inspecting or modifying arguments that wouldbe passed to the specified callback function still hold, even if the callbackfunction is not dynamically invoked. The broker is allowed to invoke thecallback function more than once per invocation of the broker. The broker isalso allowed to invoke (directly or indirectly) the function passed as acallback through another use. Finally, the broker is also allowed to relay thecallback callee invocation to a different thread.

The metadata is structured as follows: At the outer level,callbackmetadata is a list ofcallback encodings. Each encoding starts with aconstanti64 which describes the argument position of the callback functionin the call to the broker. The following elements, except the last, describewhat arguments are passed to the callback function. Each element is again ani64 constant identifying the argument of the broker that is passed through,ori64-1 to indicate an unknown or inspected argument. The order in whichthey are listed has to be the same in which they are passed to the callbackcallee. The last element of the encoding is a boolean which specifies howvariadic arguments of the broker are handled. If it is true, all variadicarguments of the broker are passed through to the callback functionafter thearguments encoded explicitly before.

In the code below, thepthread_create function is marked as a brokerthrough the!callback!1 metadata. In the example, there is only onecallback encoding, namely!2, associated with the broker. This encodingidentifies the callback function as the second argument of the broker (i642) and the sole argument of the callback function as the third one of thebroker function (i643).

declare !callback !1 dso_local i32 @pthread_create(ptr, ptr, ptr, ptr)...!2 = !{i64 2, i64 3, i1 false}!1 = !{!2}

Another example is shown below. The callback callee is the second argument ofthe__kmpc_fork_call function (i642). The callee is given two unknownvalues (each identified by ai64-1) and afterwards allvariadic arguments that are passed to the__kmpc_fork_call call (due to thefinali1true).

declare !callback !0 dso_local void @__kmpc_fork_call(ptr, i32, ptr, ...)...!1 = !{i64 2, i64 -1, i64 -1, i1 true}!0 = !{!1}

exclude’ Metadata

exclude metadata may be attached to a global variable to signify that itssection should not be included in the final executable or shared library. Thisoption is only valid for global variables with an explicit section targeting ELFor COFF. This is done using theSHF_EXCLUDE flag on ELF targets and theIMAGE_SCN_LNK_REMOVE andIMAGE_SCN_MEM_DISCARDABLE flags for COFFtargets. Additionally, this metadata is only used as a flag, so the associatednode must be empty. The explicit section should not conflict with any othersections that the user does not want removed after linking.

@object = private constant [1 x i8] c"\00", section ".foo" !exclude !0...!0 = !{}

unpredictable’ Metadata

unpredictable metadata may be attached to any branch or switchinstruction. It can be used to express the unpredictability of controlflow. Similar to the llvm.expect intrinsic, it may be used to alteroptimizations related to compare and branch instructions. The metadatais treated as a boolean value; if it exists, it signals that the branchor switch that it is attached to is completely unpredictable.

dereferenceable’ Metadata

The existence of the!dereferenceable metadata on the instructiontells the optimizer that the value loaded is known to be dereferenceable,otherwise the behavior is undefined.The number of bytes known to be dereferenceable is specified by the integervalue in the metadata node. This is analogous to the ‘’dereferenceable’’attribute on parameters and return values.

dereferenceable_or_null’ Metadata

The existence of the!dereferenceable_or_null metadata on theinstruction tells the optimizer that the value loaded is known to be eitherdereferenceable or null, otherwise the behavior is undefined.The number of bytes known to be dereferenceable is specified by the integervalue in the metadata node. This is analogous to the ‘’dereferenceable_or_null’’attribute on parameters and return values.

llvm.loop

It is sometimes useful to attach information to loop constructs. Currently,loop metadata is implemented as metadata attached to the branch instructionin the loop latch block. The loop metadata node is a list ofother metadata nodes, each representing a property of the loop. Usually,the first item of the property node is a string. For example, thellvm.loop.unroll.count suggests an unroll factor to the loopunroller:

bri1%exitcond,label%._crit_edge,label%.lr.ph,!llvm.loop!0...!0=!{!0,!1,!2}!1=!{!"llvm.loop.unroll.enable"}!2=!{!"llvm.loop.unroll.count",i324}

For legacy reasons, the first item of a loop metadata node must be areference to itself. Before the advent of the ‘distinct’ keyword, thisforced the preservation of otherwise identical metadata nodes. Sincethe loop-metadata node can be attached to multiple nodes, the ‘distinct’keyword has become unnecessary.

Prior to the property nodes, one or twoDILocation (debug location)nodes can be present in the list. The first, if present, identifies thesource-code location where the loop begins. The second, if present,identifies the source-code location where the loop ends.

Loop metadata nodes cannot be used as unique identifiers. They areneither persistent for the same loop through transformations nornecessarily unique to just one loop.

llvm.loop.disable_nonforced

This metadata disables all optional loop transformations unlessexplicitly instructed using other transformation metadata such asllvm.loop.unroll.enable. That is, no heuristic will try to determinewhether a transformation is profitable. The purpose is to avoid that theloop is transformed to a different loop before an explicitly requested(forced) transformation is applied. For instance, loop fusion can makeother transformations impossible. Mandatory loop canonicalizations suchas loop rotation are still applied.

It is recommended to use this metadata in addition to any llvm.loop.*transformation directive. Also, any loop should have at most onedirective applied to it (and a sequence of transformations built usingfollowup-attributes). Otherwise, which transformation will be applieddepends on implementation details such as the pass pipeline order.

SeeCode Transformation Metadata for details.

llvm.loop.vectorize’ and ‘llvm.loop.interleave

Metadata prefixed withllvm.loop.vectorize orllvm.loop.interleave areused to control per-loop vectorization and interleaving parameters such asvectorization width and interleave count. These metadata should be used inconjunction withllvm.loop loop identification metadata. Thellvm.loop.vectorize andllvm.loop.interleave metadata are onlyoptimization hints and the optimizer will only interleave and vectorize loops ifit believes it is safe to do so. Thellvm.loop.parallel_accesses metadatawhich contains information about loop-carried memory dependencies can be helpfulin determining the safety of these transformations.

llvm.loop.interleave.count’ Metadata

This metadata suggests an interleave count to the loop interleaver.The first operand is the stringllvm.loop.interleave.count and thesecond operand is an integer specifying the interleave count. Forexample:

!0=!{!"llvm.loop.interleave.count",i324}

Note that settingllvm.loop.interleave.count to 1 disables interleavingmultiple iterations of the loop. Ifllvm.loop.interleave.count is set to 0then the interleave count will be determined automatically.

llvm.loop.vectorize.enable’ Metadata

This metadata selectively enables or disables vectorization for the loop. Thefirst operand is the stringllvm.loop.vectorize.enable and the second operandis a bit. If the bit operand value is 1 vectorization is enabled. A value of0 disables vectorization:

!0=!{!"llvm.loop.vectorize.enable",i10}!1=!{!"llvm.loop.vectorize.enable",i11}

llvm.loop.vectorize.predicate.enable’ Metadata

This metadata selectively enables or disables creating predicated instructionsfor the loop, which can enable folding of the scalar epilogue loop into themain loop. The first operand is the stringllvm.loop.vectorize.predicate.enable and the second operand is a bit. Ifthe bit operand value is 1 vectorization is enabled. A value of 0 disablesvectorization:

!0=!{!"llvm.loop.vectorize.predicate.enable",i10}!1=!{!"llvm.loop.vectorize.predicate.enable",i11}

llvm.loop.vectorize.scalable.enable’ Metadata

This metadata selectively enables or disables scalable vectorization for theloop, and only has any effect if vectorization for the loop is already enabled.The first operand is the stringllvm.loop.vectorize.scalable.enableand the second operand is a bit. If the bit operand value is 1 scalablevectorization is enabled, whereas a value of 0 reverts to the default fixedwidth vectorization:

!0=!{!"llvm.loop.vectorize.scalable.enable",i10}!1=!{!"llvm.loop.vectorize.scalable.enable",i11}

llvm.loop.vectorize.width’ Metadata

This metadata sets the target width of the vectorizer. The firstoperand is the stringllvm.loop.vectorize.width and the secondoperand is an integer specifying the width. For example:

!0=!{!"llvm.loop.vectorize.width",i324}

Note that settingllvm.loop.vectorize.width to 1 disablesvectorization of the loop. Ifllvm.loop.vectorize.width is set to0 or if the loop does not have this metadata the width will bedetermined automatically.

llvm.loop.vectorize.followup_vectorized’ Metadata

This metadata defines which loop attributes the vectorized loop willhave. SeeCode Transformation Metadata for details.

llvm.loop.vectorize.followup_epilogue’ Metadata

This metadata defines which loop attributes the epilogue will have. Theepilogue is not vectorized and is executed when either the vectorizedloop is not known to preserve semantics (because e.g., it processes twoarrays that are found to alias by a runtime check) or for the lastiterations that do not fill a complete set of vector lanes. SeeTransformation Metadata for details.

llvm.loop.vectorize.followup_all’ Metadata

Attributes in the metadata will be added to both the vectorized andepilogue loop.SeeTransformation Metadata for details.

llvm.loop.unroll

Metadata prefixed withllvm.loop.unroll are loop unrollingoptimization hints such as the unroll factor.llvm.loop.unrollmetadata should be used in conjunction withllvm.loop loopidentification metadata. Thellvm.loop.unroll metadata are onlyoptimization hints and the unrolling will only be performed if theoptimizer believes it is safe to do so.

llvm.loop.unroll.count’ Metadata

This metadata suggests an unroll factor to the loop unroller. Thefirst operand is the stringllvm.loop.unroll.count and the secondoperand is a positive integer specifying the unroll factor. Forexample:

!0=!{!"llvm.loop.unroll.count",i324}

If the trip count of the loop is less than the unroll count the loopwill be partially unrolled.

llvm.loop.unroll.disable’ Metadata

This metadata disables loop unrolling. The metadata has a single operandwhich is the stringllvm.loop.unroll.disable. For example:

!0=!{!"llvm.loop.unroll.disable"}

llvm.loop.unroll.runtime.disable’ Metadata

This metadata disables runtime loop unrolling. The metadata has a singleoperand which is the stringllvm.loop.unroll.runtime.disable. For example:

!0=!{!"llvm.loop.unroll.runtime.disable"}

llvm.loop.unroll.enable’ Metadata

This metadata suggests that the loop should be fully unrolled if the trip countis known at compile time and partially unrolled if the trip count is not knownat compile time. The metadata has a single operand which is the stringllvm.loop.unroll.enable. For example:

!0=!{!"llvm.loop.unroll.enable"}

llvm.loop.unroll.full’ Metadata

This metadata suggests that the loop should be unrolled fully. Themetadata has a single operand which is the stringllvm.loop.unroll.full.For example:

!0=!{!"llvm.loop.unroll.full"}

llvm.loop.unroll.followup’ Metadata

This metadata defines which loop attributes the unrolled loop will have.SeeTransformation Metadata for details.

llvm.loop.unroll.followup_remainder’ Metadata

This metadata defines which loop attributes the remainder loop afterpartial/runtime unrolling will have. SeeTransformation Metadata for details.

llvm.loop.unroll_and_jam

This metadata is treated very similarly to thellvm.loop.unroll metadataabove, but affect the unroll and jam pass. In addition any loop withllvm.loop.unroll metadata but nollvm.loop.unroll_and_jam metadata willdisable unroll and jam (sollvm.loop.unroll metadata will be left to theunroller, plusllvm.loop.unroll.disable metadata will disable unroll and jamtoo.)

The metadata for unroll and jam otherwise is the same as forunroll.llvm.loop.unroll_and_jam.enable,llvm.loop.unroll_and_jam.disable andllvm.loop.unroll_and_jam.count do the same as for unroll.llvm.loop.unroll_and_jam.full is not supported. Again these are only hintsand the normal safety checks will still be performed.

llvm.loop.unroll_and_jam.count’ Metadata

This metadata suggests an unroll and jam factor to use, similarly tollvm.loop.unroll.count. The first operand is the stringllvm.loop.unroll_and_jam.count and the second operand is a positive integerspecifying the unroll factor. For example:

!0=!{!"llvm.loop.unroll_and_jam.count",i324}

If the trip count of the loop is less than the unroll count the loopwill be partially unroll and jammed.

llvm.loop.unroll_and_jam.disable’ Metadata

This metadata disables loop unroll and jamming. The metadata has a singleoperand which is the stringllvm.loop.unroll_and_jam.disable. For example:

!0=!{!"llvm.loop.unroll_and_jam.disable"}

llvm.loop.unroll_and_jam.enable’ Metadata

This metadata suggests that the loop should be fully unroll and jammed if thetrip count is known at compile time and partially unrolled if the trip count isnot known at compile time. The metadata has a single operand which is thestringllvm.loop.unroll_and_jam.enable. For example:

!0=!{!"llvm.loop.unroll_and_jam.enable"}

llvm.loop.unroll_and_jam.followup_outer’ Metadata

This metadata defines which loop attributes the outer unrolled loop willhave. SeeTransformation Metadata fordetails.

llvm.loop.unroll_and_jam.followup_inner’ Metadata

This metadata defines which loop attributes the inner jammed loop willhave. SeeTransformation Metadata fordetails.

llvm.loop.unroll_and_jam.followup_remainder_outer’ Metadata

This metadata defines which attributes the epilogue of the outer loopwill have. This loop is usually unrolled, meaning there is no suchloop. This attribute will be ignored in this case. SeeTransformation Metadata for details.

llvm.loop.unroll_and_jam.followup_remainder_inner’ Metadata

This metadata defines which attributes the inner loop of the epiloguewill have. The outer epilogue will usually be unrolled, meaning therecan be multiple inner remainder loops. SeeTransformation Metadata for details.

llvm.loop.unroll_and_jam.followup_all’ Metadata

Attributes specified in the metadata is added to allllvm.loop.unroll_and_jam.* loops. SeeTransformation Metadata for details.

llvm.loop.licm_versioning.disable’ Metadata

This metadata indicates that the loop should not be versioned for the purposeof enabling loop-invariant code motion (LICM). The metadata has a single operandwhich is the stringllvm.loop.licm_versioning.disable. For example:

!0=!{!"llvm.loop.licm_versioning.disable"}

llvm.loop.distribute.enable’ Metadata

Loop distribution allows splitting a loop into multiple loops. Currently,this is only performed if the entire loop cannot be vectorized due to unsafememory dependencies. The transformation will attempt to isolate the unsafedependencies into their own loop.

This metadata can be used to selectively enable or disable distribution of theloop. The first operand is the stringllvm.loop.distribute.enable and thesecond operand is a bit. If the bit operand value is 1 distribution isenabled. A value of 0 disables distribution:

!0=!{!"llvm.loop.distribute.enable",i10}!1=!{!"llvm.loop.distribute.enable",i11}

This metadata should be used in conjunction withllvm.loop loopidentification metadata.

llvm.loop.distribute.followup_coincident’ Metadata

This metadata defines which attributes extracted loops with no cyclicdependencies will have (i.e. can be vectorized). SeeTransformation Metadata for details.

llvm.loop.distribute.followup_sequential’ Metadata

This metadata defines which attributes the isolated loops with unsafememory dependencies will have. SeeTransformation Metadata for details.

llvm.loop.distribute.followup_fallback’ Metadata

If loop versioning is necessary, this metadata defined the attributesthe non-distributed fallback version will have. SeeTransformation Metadata for details.

llvm.loop.distribute.followup_all’ Metadata

The attributes in this metadata is added to all followup loops of theloop distribution pass. SeeTransformation Metadata for details.

llvm.licm.disable’ Metadata

This metadata indicates that loop-invariant code motion (LICM) should not beperformed on this loop. The metadata has a single operand which is the stringllvm.licm.disable. For example:

!0=!{!"llvm.licm.disable"}

Note that although it operates per loop it isn’t given the llvm.loop prefixas it is not affected by thellvm.loop.disable_nonforced metadata.

llvm.access.group’ Metadata

llvm.access.group metadata can be attached to any instruction thatpotentially accesses memory. It can point to a single distinct metadatanode, which we call access group. This node represents all memory accessinstructions referring to it viallvm.access.group. When aninstruction belongs to multiple access groups, it can also point to alist of accesses groups, illustrated by the following example.

%val=loadi32,ptr%arrayidx,!llvm.access.group!0...!0=!{!1,!2}!1=distinct!{}!2=distinct!{}

It is illegal for the list node to be empty since it might be confusedwith an access group.

The access group metadata node must be ‘distinct’ to avoid collapsingmultiple access groups by content. An access group metadata node mustalways be empty which can be used to distinguish an access groupmetadata node from a list of access groups. Being empty avoids thesituation that the content must be updated which, because metadata isimmutable by design, would required finding and updating all referencesto the access group node.

The access group can be used to refer to a memory access instructionwithout pointing to it directly (which is not possible in globalmetadata). Currently, the only metadata making use of it isllvm.loop.parallel_accesses.

llvm.loop.parallel_accesses’ Metadata

Thellvm.loop.parallel_accesses metadata refers to one or moreaccess group metadata nodes (seellvm.access.group). It denotes thatno loop-carried memory dependence exist between it and other instructionsin the loop with this metadata.

Letm1 andm2 be two instructions that both have thellvm.access.group metadata to the access groupg1, respectivelyg2 (which might be identical). If a loop contains both access groupsin itsllvm.loop.parallel_accesses metadata, then the compiler canassume that there is no dependency betweenm1 andm2 carried bythis loop. Instructions that belong to multiple access groups areconsidered having this property if at least one of the access groupsmatches thellvm.loop.parallel_accesses list.

If all memory-accessing instructions in a loop havellvm.access.group metadata that each refer to one of the accessgroups of a loop’sllvm.loop.parallel_accesses metadata, then theloop has no loop carried memory dependencies and is considered to be aparallel loop. If there is a loop-carried dependency, the behavior isundefined.

Note that if not all memory access instructions belong to an accessgroup referred to byllvm.loop.parallel_accesses, then the loop mustnot be considered trivially parallel. Additionalmemory dependence analysis is required to make that determination. As a failsafe mechanism, this causes loops that were originally parallel to be consideredsequential (if optimization passes that are unaware of the parallel semanticsinsert new memory instructions into the loop body).

Example of a loop that is considered parallel due to its correct use ofbothllvm.access.group andllvm.loop.parallel_accessesmetadata types.

for.body:...%val0=loadi32,ptr%arrayidx,!llvm.access.group!1...storei32%val0,ptr%arrayidx1,!llvm.access.group!1...bri1%exitcond,label%for.end,label%for.body,!llvm.loop!0for.end:...!0=distinct!{!0,!{!"llvm.loop.parallel_accesses",!1}}!1=distinct!{}

It is also possible to have nested parallel loops:

outer.for.body:...%val1=loadi32,ptr%arrayidx3,!llvm.access.group!4...brlabel%inner.for.bodyinner.for.body:...%val0=loadi32,ptr%arrayidx1,!llvm.access.group!3...storei32%val0,ptr%arrayidx2,!llvm.access.group!3...bri1%exitcond,label%inner.for.end,label%inner.for.body,!llvm.loop!1inner.for.end:...storei32%val1,ptr%arrayidx4,!llvm.access.group!4...bri1%exitcond,label%outer.for.end,label%outer.for.body,!llvm.loop!2outer.for.end:; preds = %for.body...!1=distinct!{!1,!{!"llvm.loop.parallel_accesses",!3}}; metadata for the inner loop!2=distinct!{!2,!{!"llvm.loop.parallel_accesses",!3,!4}}; metadata for the outer loop!3=distinct!{}; access group for instructions in the inner loop (which are implicitly contained in outer loop as well)!4=distinct!{}; access group for instructions in the outer, but not the inner loop

llvm.loop.mustprogress’ Metadata

Thellvm.loop.mustprogress metadata indicates that this loop is required toterminate, unwind, or interact with the environment in an observable way e.g.via a volatile memory access, I/O, or other synchronization. If such a loop isnot found to interact with the environment in an observable way, the loop maybe removed. This corresponds to themustprogress function attribute.

irr_loop’ Metadata

irr_loop metadata may be attached to the terminator instruction of a basicblock that’s an irreducible loop header (note that an irreducible loop has morethan once header basic blocks.) Ifirr_loop metadata is attached to theterminator instruction of a basic block that is not really an irreducible loopheader, the behavior is undefined. The intent of this metadata is to improve theaccuracy of the block frequency propagation. For example, in the code below, theblockheader0 may have a loop header weight (relative to the other headers ofthe irreducible loop) of 100:

header0:...bri1%cmp,label%t1,label%t2,!irr_loop!0...!0=!{"loop_header_weight",i64100}

Irreducible loop header weights are typically based on profile data.

invariant.group’ Metadata

The experimentalinvariant.group metadata may be attached toload/store instructions referencing a single metadata with no entries.The existence of theinvariant.group metadata on the instruction tellsthe optimizer that everyload andstore to the same pointer operandcan be assumed to load or store the samevalue (but see thellvm.launder.invariant.group intrinsic which affectswhen two pointers are considered the same). Pointers returned by bitcast orgetelementptr with only zero indices are considered the same.

Examples:

@unknownPtr=externalglobali8...%ptr=allocai8storei842,ptr%ptr,!invariant.group!0callvoid@foo(ptr%ptr)%a=loadi8,ptr%ptr,!invariant.group!0; Can assume that value under %ptr didn't changecallvoid@foo(ptr%ptr)%newPtr=callptr@getPointer(ptr%ptr)%c=loadi8,ptr%newPtr,!invariant.group!0; Can't assume anything, because we only have information about %ptr%unknownValue=loadi8,ptr@unknownPtrstorei8%unknownValue,ptr%ptr,!invariant.group!0; Can assume that %unknownValue == 42callvoid@foo(ptr%ptr)%newPtr2=callptr@llvm.launder.invariant.group.p0(ptr%ptr)%d=loadi8,ptr%newPtr2,!invariant.group!0; Can't step through launder.invariant.group to get value of %ptr...declarevoid@foo(ptr)declareptr@getPointer(ptr)declareptr@llvm.launder.invariant.group.p0(ptr)!0=!{}

The invariant.group metadata must be dropped when replacing one pointer byanother based on aliasing information. This is because invariant.group is tiedto the SSA value of the pointer operand.

%v=loadi8,ptr%x,!invariant.group!0; if %x mustalias %y then we can replace the above instruction with%v=loadi8,ptr%y

Note that this is an experimental feature, which means that its semantics mightchange in the future.

type’ Metadata

SeeType Metadata.

associated’ Metadata

Theassociated metadata may be attached to a global variable definition witha single argument that references a global object (optionally through an alias).

This metadata lowers to the ELF section flagSHF_LINK_ORDER which preventsdiscarding of the global variable in linker GC unless the referenced object isalso discarded. The linker support for this feature is spotty. For bestcompatibility, globals carrying this metadata should:

  • Be in@llvm.compiler.used.

  • If the referenced global variable is in a comdat, be in the same comdat.

!associated can not express many-to-one relationship. A global variable withthe metadata should generally not be referenced by a function: the function maybe inlined into other functions, leading to more references to the metadata.Ideally we would want to keep metadata alive as long as any inline location isalive, but this many-to-one relationship is not representable. Moreover, if themetadata is retained while the function is discarded, the linker will report anerror of a relocation referencing a discarded section.

The metadata is often used with an explicit section consisting of valid Cidentifiers so that the runtime can find the metadata section withlinker-defined encapsulation symbols__start_<section_name> and__stop_<section_name>.

It does not have any effect on non-ELF targets.

Example:

$a = comdat any@a = global i32 1, comdat $a@b = internal global i32 2, comdat $a, section "abc", !associated !0!0 = !{ptr @a}

prof’ Metadata

Theprof metadata is used to record profile data in the IR.The first operand of the metadata node indicates the profile metadatatype. There are currently 3 types:branch_weights,function_entry_count, andVP.

branch_weights

Branch weight metadata attached to a branch, select, switch or call instructionrepresents the likeliness of the associated branch being taken.For more information, seeLLVM Branch Weight Metadata.

function_entry_count

Function entry count metadata can be attached to function definitionsto record the number of times the function is called. Used with BFIinformation, it is also used to derive the basic block profile count.For more information, seeLLVM Branch Weight Metadata.

VP

VP (value profile) metadata can be attached to instructions that havevalue profile information. Currently this is indirect calls (where itrecords the hottest callees) and calls to memory intrinsics such as memcpy,memmove, and memset (where it records the hottest byte lengths).

Each VP metadata node contains “VP” string, then a uint32_t value for the valueprofiling kind, a uint64_t value for the total number of times the instructionis executed, followed by uint64_t value and execution count pairs.The value profiling kind is 0 for indirect call targets and 1 for memoryoperations. For indirect call targets, each profile value is a hashof the callee function name, and for memory operations each value is thebyte length.

Note that the value counts do not need to add up to the total countlisted in the third operand (in practice only the top hottest valuesare tracked and reported).

Indirect call example:

callvoid%f(),!prof!1!1=!{!"VP",i320,i641600,i647651369219802541373,i641030,i64-4377547752858689819,i64410}

Note that the VP type is 0 (the second operand), which indicates this isan indirect call value profile data. The third operand indicates that theindirect call executed 1600 times. The 4th and 6th operands give thehashes of the 2 hottest target functions’ names (this is the same hash usedto represent function names in the profile database), and the 5th and 7thoperands give the execution count that each of the respective prior targetfunctions was called.

annotation’ Metadata

Theannotation metadata can be used to attach a tuple of annotation stringsor a tuple of a tuple of annotation strings to any instruction. This metadata doesnot impact the semantics of the program and may only be used to provide additionalinsight about the program and transformations to users.

Example:

%a.addr = alloca ptr, align 8, !annotation !0!0 = !{!"auto-init"}

Embedding tuple of strings example:

%a.ptr = getelementptr ptr, ptr %base, i64 0. !annotation !0!0 = !{!1}!1 = !{!"gep offset", !"0"}

func_sanitize’ Metadata

Thefunc_sanitize metadata is used to attach two values for the functionsanitizer instrumentation. The first value is the ubsan function signature.The second value is the address of the proxy variable which stores the addressof the RTTI descriptor. Ifprologue and ‘func_sanitize’are used at the same time,prologue is emitted before‘func_sanitize’ in the output.

Example:

@__llvm_rtti_proxy = private unnamed_addr constant ptr @_ZTIFvvEdefine void @_Z3funv() !func_sanitize !0 {  return void}!0 = !{i32 846595819, ptr @__llvm_rtti_proxy}

kcfi_type’ Metadata

Thekcfi_type metadata can be used to attach a type identifier tofunctions that can be called indirectly. The type data is emitted before thefunction entry in the assembly. Indirect calls with thekcfi operandbundle will emit a check that compares the type identifier to themetadata.

Example:

define dso_local i32 @f() !kcfi_type !0 {  ret i32 0}!0 = !{i32 12345678}

Clang emitskcfi_type metadata nodes for address-taken functions with-fsanitize=kcfi.

pcsections’ Metadata

Thepcsections metadata can be attached to instructions and functions, forwhich addresses, viz. program counters (PCs), are to be emitted in speciallyencoded binary sections. More details can be found in thePC Sections Metadata documentation.

memprof’ Metadata

Thememprof metadata is used to record memory profile data on heapallocation calls. Multiple context-sensitive profiles can be representedwith a singlememprof metadata attachment.

Example:

%call = call ptr @_Znam(i64 10), !memprof !0, !callsite !5!0 = !{!1, !3}!1 = !{!2, !"cold"}!2 = !{i64 4854880825882961848, i64 1905834578520680781}!3 = !{!4, !"notcold"}!4 = !{i64 4854880825882961848, i64 -6528110295079665978}!5 = !{i64 4854880825882961848}

Each operand in thememprof metadata attachment describes the profiledbehavior of memory allocated by the associated allocation for a given context.In the above example, there were 2 profiled contexts, one allocating memorythat was typically cold and one allocating memory that was typically not cold.

The format of the metadata describing a context specific profile (e.g.!1 and!3 above) requires a first operand that is a metadata nodedescribing the context, followed by a list of string metadata tags describingthe profile behavior (e.g.cold andnotcold) above. The metadata nodesdescribing the context (e.g.!2 and!4 above) are unique idscorresponding to callsites, which can be matched to associated IR calls viacallsite metadata. In practice these ids are formed viaa hash of the callsite’s debug info, and the associated call may be in adifferent module. The contexts are listed in order from leaf-most call (theallocation itself) to the outermost callsite context required for uniquelyidentifying the described profile behavior (note this may not be the top ofthe profiled call stack).

callsite’ Metadata

Thecallsite metadata is used to identify callsites involved in memoryprofile contexts described inmemprof metadata.

It is attached both to the profile allocation calls (see the example inmemprof metadata), as well as to other callsitesin profiled contexts described in heap allocationmemprof metadata.

Example:

%call = call ptr @_Z1Bb(void), !callsite !0!0 = !{i64 -6528110295079665978, i64 5462047985461644151}

Each operand in thecallsite metadata attachment is a unique idcorresponding to a callsite (possibly inlined). In practice these ids areformed via a hash of the callsite’s debug info. If the call was not inlinedinto any callers it will contain a single operand (id). If it was inlinedit will contain a list of ids, including the ids of the callsites in thefull inline sequence, in order from the leaf-most call’s id to the outermostinlined call.

noalias.addrspace’ Metadata

Thenoalias.addrspace metadata is used to identify memoryoperations which cannot access objects allocated in a range of addressspaces. It is attached to memory instructions, includingatomicrmw,cmpxchg, andcall instructions.

This follows the same form asrange metadata,except the field entries must be of typei32. The interpretation isthe same numeric address spaces as applied to IR values.

Example:

; %ptr cannot point to an object allocated in addrspace(5)%rmw.valid=atomicrmwandptr%ptr,i64%valueseq_cst,!noalias.addrspace!0; Undefined behavior. The underlying object is allocated in one of the listed; address spaces.%alloca=allocai64,addrspace(5)%alloca.cast=addrspacecastptraddrspace(5)%allocatoptr%rmw.ub=atomicrmwandptr%alloca.cast,i64%valueseq_cst,!noalias.addrspace!0!0=!{i325,i326}; Exclude addrspace(5) only

This is intended for use on targets with a notion of generic addressspaces, which at runtime resolve to different physical memoryspaces. The interpretation of the address space values is targetspecific. The behavior is undefined if the runtime memory address doesresolve to an object defined in one of the indicated address spaces.

Module Flags Metadata

Information about the module as a whole is difficult to convey to LLVM’ssubsystems. The LLVM IR isn’t sufficient to transmit this information.Thellvm.module.flags named metadata exists in order to facilitatethis. These flags are in the form of key / value pairs — much like adictionary — making it easy for any subsystem who cares about a flag tolook it up.

Thellvm.module.flags metadata contains a list of metadata triplets.Each triplet has the following form:

  • The first element is abehavior flag, which specifies the behaviorwhen two (or more) modules are merged together, and it encounters two(or more) metadata with the same ID. The supported behaviors aredescribed below.

  • The second element is a metadata string that is a unique ID for themetadata. Each module may only have one flag entry for each unique ID (notincluding entries with theRequire behavior).

  • The third element is the value of the flag.

When two (or more) modules are merged together, the resultingllvm.module.flags metadata is the union of the modules’ flags. That is, foreach unique metadata ID string, there will be exactly one entry in the mergedmodulesllvm.module.flags metadata table, and the value for that entry willbe determined by the merge behavior flag, as described below. The only exceptionis that entries with theRequire behavior are always preserved.

The following behaviors are supported:

Value

Behavior

1

Error

Emits an error if two values disagree, otherwise the resulting valueis that of the operands.

2

Warning

Emits a warning if two values disagree. The result value will be theoperand for the flag from the first module being linked, unless theother module usesMin orMax, in which case the result willbeMin (with the min value) orMax (with the max value),respectively.

3

Require

Adds a requirement that another module flag be present and have aspecified value after linking is performed. The value must be ametadata pair, where the first element of the pair is the ID of themodule flag to be restricted, and the second element of the pair isthe value the module flag should be restricted to. This behavior canbe used to restrict the allowable results (via triggering of anerror) of linking IDs with theOverride behavior.

4

Override

Uses the specified value, regardless of the behavior or value of theother module. If both modules specifyOverride, but the valuesdiffer, an error will be emitted.

5

Append

Appends the two values, which are required to be metadata nodes.

6

AppendUnique

Appends the two values, which are required to be metadatanodes. However, duplicate entries in the second list are droppedduring the append operation.

7

Max

Takes the max of the two values, which are required to be integers.

8

Min

Takes the min of the two values, which are required to be non-negative integers.An absent module flag is treated as having the value 0.

It is an error for a particular unique flag ID to have multiple behaviors,except in the case ofRequire (which adds restrictions on another metadatavalue) orOverride.

An example of module flags:

!0=!{i321,!"foo",i321}!1=!{i324,!"bar",i3237}!2=!{i322,!"qux",i3242}!3=!{i323,!"qux",!{!"foo",i321}}!llvm.module.flags=!{!0,!1,!2,!3}
  • Metadata!0 has the ID!"foo" and the value ‘1’. The behaviorif two or more!"foo" flags are seen is to emit an error if theirvalues are not equal.

  • Metadata!1 has the ID!"bar" and the value ‘37’. Thebehavior if two or more!"bar" flags are seen is to use the value‘37’.

  • Metadata!2 has the ID!"qux" and the value ‘42’. Thebehavior if two or more!"qux" flags are seen is to emit awarning if their values are not equal.

  • Metadata!3 has the ID!"qux" and the value:

    !{ !"foo", i32 1 }

    The behavior is to emit an error if thellvm.module.flags does notcontain a flag with the ID!"foo" that has the value ‘1’ after linking isperformed.

Synthesized Functions Module Flags Metadata

These metadata specify the default attributes synthesized functions should have.These metadata are currently respected by a few instrumentation passes, such assanitizers.

These metadata correspond to a few function attributes with significant codegeneration behaviors. Function attributes with just optimization purposesshould not be listed because the performance impact of these synthesizedfunctions is small.

  • “frame-pointer”:Max. The value can be 0, 1, or 2. A synthesized functionwill get the “frame-pointer” function attribute, with value being “none”,“non-leaf”, or “all”, respectively.

  • “function_return_thunk_extern”: The synthesized function will get thefn_return_thunk_extern function attribute.

  • “uwtable”:Max. The value can be 0, 1, or 2. If the value is 1, a synthesizedfunction will get theuwtable(sync) function attribute, if the value is 2,a synthesized function will get theuwtable(async) function attribute.

Objective-C Garbage Collection Module Flags Metadata

On the Mach-O platform, Objective-C stores metadata about garbagecollection in a special section called “image info”. The metadataconsists of a version number and a bitmask specifying what types ofgarbage collection are supported (if any) by the file. If two or moremodules are linked together their garbage collection metadata needs tobe merged rather than appended together.

The Objective-C garbage collection module flags metadata consists of thefollowing key-value pairs:

Key

Value

Objective-CVersion

[Required] — The Objective-C ABI version. Valid values are 1 and 2.

Objective-CImageInfoVersion

[Required] — The version of the image info section. Currentlyalways 0.

Objective-CImageInfoSection

[Required] — The section to place the metadata. Valid values are"__OBJC,__image_info,regular" for Objective-C ABI version 1, and"__DATA,__objc_imageinfo,regular,no_dead_strip" forObjective-C ABI version 2.

Objective-CGarbageCollection

[Required] — Specifies whether garbage collection is supported ornot. Valid values are 0, for no garbage collection, and 2, for garbagecollection supported.

Objective-CGCOnly

[Optional] — Specifies that only garbage collection is supported.If present, its value must be 6. This flag requires that theObjective-CGarbageCollection flag have the value 2.

Some important flag interactions:

  • If a module withObjective-CGarbageCollection set to 0 ismerged with a module withObjective-CGarbageCollection set to2, then the resulting module has theObjective-CGarbageCollection flag set to 0.

  • A module withObjective-CGarbageCollection set to 0 cannot bemerged with a module withObjective-CGCOnly set to 6.

C type width Module Flags Metadata

The ARM backend emits a section into each generated object file describing theoptions that it was compiled with (in a compiler-independent way) to preventlinking incompatible objects, and to allow automatic library selection. Someof these options are not visible at the IR level, namely wchar_t width and enumwidth.

To pass this information to the backend, these options are encoded in moduleflags metadata, using the following key-value pairs:

Key

Value

short_wchar

  • 0 — sizeof(wchar_t) == 4

  • 1 — sizeof(wchar_t) == 2

short_enum

  • 0 — Enums are at least as large as anint.

  • 1 — Enums are stored in the smallest integer type which canrepresent all of its values.

For example, the following metadata section specifies that the module wascompiled with awchar_t width of 4 bytes, and the underlying type of anenum is the smallest type which can represent all of its values:

!llvm.module.flags = !{!0, !1}!0 = !{i32 1, !"short_wchar", i32 1}!1 = !{i32 1, !"short_enum", i32 0}

Stack Alignment Metadata

Changes the default stack alignment from the target ABI’s implicit defaultstack alignment. Takes an i32 value in bytes. It is considered an error to linktwo modules together with different values for this metadata.

For example:

!llvm.module.flags = !{!0}!0 = !{i32 1, !”override-stack-alignment”, i32 8}

This will change the stack alignment to 8B.

Embedded Objects Names Metadata

Offloading compilations need to embed device code into the host section table tocreate a fat binary. This metadata node references each global that will beembedded in the module. The primary use for this is to make referencing theseglobals more efficient in the IR. The metadata references nodes containingpointers to the global to be embedded followed by the section name it will bestored at:

!llvm.embedded.objects = !{!0}!0 = !{ptr @object, !".section"}

Automatic Linker Flags Named Metadata

Some targets support embedding of flags to the linker inside individual objectfiles. Typically this is used in conjunction with language extensions whichallow source files to contain linker command line options, and have theseautomatically be transmitted to the linker via object files.

These flags are encoded in the IR using named metadata with the name!llvm.linker.options. Each operand is expected to be a metadata nodewhich should be a list of other metadata nodes, each of which should be alist of metadata strings defining linker options.

For example, the following metadata section specifies two separate sets oflinker options, presumably to link againstlibz and theCocoaframework:

!0 = !{ !"-lz" }!1 = !{ !"-framework", !"Cocoa" }!llvm.linker.options = !{ !0, !1 }

The metadata encoding as lists of lists of options, as opposed to a collapsedlist of options, is chosen so that the IR encoding can use multiple optionstrings to specify e.g., a single library, while still having that specifier bepreserved as an atomic element that can be recognized by a target specificassembly writer or object file emitter.

Each individual option is required to be either a valid option for the target’slinker, or an option that is reserved by the target specific assembly writer orobject file emitter. No other aspect of these options is defined by the IR.

Dependent Libs Named Metadata

Some targets support embedding of strings into object files to indicatea set of libraries to add to the link. Typically this is used in conjunctionwith language extensions which allow source files to explicitly declare thelibraries they depend on, and have these automatically be transmitted to thelinker via object files.

The list is encoded in the IR using named metadata with the name!llvm.dependent-libraries. Each operand is expected to be a metadata nodewhich should contain a single string operand.

For example, the following metadata section contains two library specifiers:

!0 = !{!"a library specifier"}!1 = !{!"another library specifier"}!llvm.dependent-libraries = !{ !0, !1 }

Each library specifier will be handled independently by the consuming linker.The effect of the library specifiers are defined by the consuming linker.

ThinLTO Summary

Compiling withThinLTOcauses the building of a compact summary of the module that is emitted intothe bitcode. The summary is emitted into the LLVM assembly and identifiedin syntax by a caret (’^’).

The summary is parsed into a bitcode output, along with the ModuleIR, via the “llvm-as” tool. Tools that parse the Module IR for the purposesof optimization (e.g. “clang-xir” and “opt”), will ignore thesummary entries (just as they currently ignore summary entries in a bitcodeinput file).

Eventually, the summary will be parsed into a ModuleSummaryIndex object underthe same conditions where summary index is currently built from bitcode.Specifically, tools that test the Thin Link portion of a ThinLTO compile(i.e. llvm-lto and llvm-lto2), or when parsing a combined indexfor a distributed ThinLTO backend via clang’s “-fthinlto-index=<>” flag(this part is not yet implemented, use llvm-as to create a bitcode objectbefore feeding into thin link tools for now).

There are currently 3 types of summary entries in the LLVM assembly:module paths,global values, andtype identifiers.

Module Path Summary Entry

Each module path summary entry lists a module containing global values includedin the summary. For a single IR module there will be one such entry, butin a combined summary index produced during the thin link, there will beone module path entry per linked module with summary.

Example:

^0 = module: (path: "/path/to/file.o", hash: (2468601609, 1329373163, 1565878005, 638838075, 3148790418))

Thepath field is a string path to the bitcode file, and thehashfield is the 160-bit SHA-1 hash of the IR bitcode contents, used forincremental builds and caching.

Global Value Summary Entry

Each global value summary entry corresponds to a global value defined orreferenced by a summarized module.

Example:

^4 = gv: (name: "f"[, summaries: (Summary)[, (Summary)]*]?) ; guid = 14740650423002898831

For declarations, there will not be a summary list. For definitions, aglobal value will contain a list of summaries, one per module containinga definition. There can be multiple entries in a combined summary indexfor symbols with weak linkage.

EachSummary format will depend on whether the global value is afunction,variable, oralias.

Function Summary

If the global value is a function, theSummary entry will look like:

function: (module: ^0, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 0), insts: 2[, FuncFlags]?[, Calls]?[, TypeIdInfo]?[, Params]?[, Refs]?

Themodule field includes the summary entry id for the module containingthis definition, and theflags field contains information such asthe linkage type, a flag indicating whether it is legal to import thedefinition, whether it is globally live and whether the linker resolved itto a local definition (the latter two are populated during the thin link).Theinsts field contains the number of IR instructions in the function.Finally, there are several optional fields:FuncFlags,Calls,TypeIdInfo,Params,Refs.

Global Variable Summary

If the global value is a variable, theSummary entry will look like:

variable: (module: ^0, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 0)[, Refs]?

The variable entry contains a subset of the fields in afunction summary, see the descriptions there.

Alias Summary

If the global value is an alias, theSummary entry will look like:

alias: (module: ^0, flags: (linkage: external, notEligibleToImport: 0, live: 0, dsoLocal: 0), aliasee: ^2)

Themodule andflags fields are as described for afunction summary. Thealiasee fieldcontains a reference to the global value summary entry of the aliasee.

Function Flags

The optionalFuncFlags field looks like:

funcFlags: (readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0, noInline: 0, alwaysInline: 0, noUnwind: 1, mayThrow: 0, hasUnknownCall: 0)

If unspecified, flags are assumed to hold the conservativefalse value of0.

Calls

The optionalCalls field looks like:

calls: ((Callee)[, (Callee)]*)

where eachCallee looks like:

callee: ^1[, hotness: None]?[, relbf: 0]?

Thecallee refers to the summary entry id of the callee. At most oneofhotness (which can take the valuesUnknown,Cold,None,Hot, andCritical), andrelbf (which holds the integerbranch frequency relative to the entry frequency, scaled down by 2^8)may be specified. The defaults areUnknown and0, respectively.

Params

The optionalParams is used byStackSafety and looks like:

Params: ((Param)[, (Param)]*)

where eachParam describes pointer parameter access inside of thefunction and looks like:

param: 4, offset: [0, 5][, calls: ((Callee)[, (Callee)]*)]?

where the firstparam is the number of the parameter it describes,offset is the inclusive range of offsets from the pointer parameter to byteswhich can be accessed by the function. This range does not include accesses byfunction calls fromcalls list.

where eachCallee describes how parameter is forwarded into otherfunctions and looks like:

callee: ^3, param: 5, offset: [-3, 3]

Thecallee refers to the summary entry id of the callee,param isthe number of the callee parameter which points into the callers parameterwith offset known to be inside of theoffset range.calls will beconsumed and removed by thin link stage to updateParam::offset so itcovers all accesses possible bycalls.

Pointer parameter without correspondingParam is considered unsafe and weassume that access with any offset is possible.

Example:

If we have the following function:

define i64 @foo(ptr %0, ptr %1, ptr %2, i8 %3) {  store ptr %1, ptr @x  %5 = getelementptr inbounds i8, ptr %2, i64 5  %6 = load i8, ptr %5  %7 = getelementptr inbounds i8, ptr %2, i8 %3  tail call void @bar(i8 %3, ptr %7)  %8 = load i64, ptr %0  ret i64 %8}

We can expect the record like this:

params: ((param: 0, offset: [0, 7]),(param: 2, offset: [5, 5], calls: ((callee: ^3, param: 1, offset: [-128, 127]))))

The function may access just 8 bytes of the parameter %0 .calls is empty,so the parameter is either not used for function calls oroffset alreadycovers all accesses from nested function calls.Parameter %1 escapes, so access is unknown.The function itself can access just a single byte of the parameter %2. Additionalaccess is possible inside of the@bar or^3. The function adds signedoffset to the pointer and passes the result as the argument %1 into^3.This record itself does not tell us how^3 will access the parameter.Parameter %3 is not a pointer.

Refs

The optionalRefs field looks like:

refs: ((Ref)[, (Ref)]*)

where eachRef contains a reference to the summary id of the referencedvalue (e.g.^1).

TypeIdInfo

The optionalTypeIdInfo field, used forControl Flow Integrity,looks like:

typeIdInfo: [(TypeTests)]?[, (TypeTestAssumeVCalls)]?[, (TypeCheckedLoadVCalls)]?[, (TypeTestAssumeConstVCalls)]?[, (TypeCheckedLoadConstVCalls)]?

These optional fields have the following forms:

TypeTests
typeTests: (TypeIdRef[, TypeIdRef]*)

Where eachTypeIdRef refers to atype idby summary id orGUID.

TypeTestAssumeVCalls
typeTestAssumeVCalls: (VFuncId[, VFuncId]*)

Where each VFuncId has the format:

vFuncId: (TypeIdRef, offset: 16)

Where eachTypeIdRef refers to atype idby summary id orGUID preceded by aguid: tag.

TypeCheckedLoadVCalls
typeCheckedLoadVCalls: (VFuncId[, VFuncId]*)

Where each VFuncId has the format described forTypeTestAssumeVCalls.

TypeTestAssumeConstVCalls
typeTestAssumeConstVCalls: (ConstVCall[, ConstVCall]*)

Where each ConstVCall has the format:

(VFuncId, args: (Arg[, Arg]*))

and where each VFuncId has the format described forTypeTestAssumeVCalls,and each Arg is an integer argument number.

TypeCheckedLoadConstVCalls
typeCheckedLoadConstVCalls: (ConstVCall[, ConstVCall]*)

Where each ConstVCall has the format described forTypeTestAssumeConstVCalls.

Type ID Summary Entry

Each type id summary entry corresponds to a type identifier resolutionwhich is generated during the LTO link portion of the compile when buildingwithControl Flow Integrity,so these are only present in a combined summary index.

Example:

^4 = typeid: (name: "_ZTS1A", summary: (typeTestRes: (kind: allOnes, sizeM1BitWidth: 7[, alignLog2: 0]?[, sizeM1: 0]?[, bitMask: 0]?[, inlineBits: 0]?)[, WpdResolutions]?)) ; guid = 7004155349499253778

ThetypeTestRes gives the type test resolutionkind (which maybeunsat,byteArray,inline,single, orallOnes), andthesize-1 bit width. It is followed by optional flags, which default to 0,and an optional WpdResolutions (whole program devirtualization resolution)field that looks like:

wpdResolutions: ((offset: 0, WpdRes)[, (offset: 1, WpdRes)]*

where each entry is a mapping from the given byte offset to the whole-programdevirtualization resolution WpdRes, that has one of the following formats:

wpdRes: (kind: branchFunnel)wpdRes: (kind: singleImpl, singleImplName: "_ZN1A1nEi")wpdRes: (kind: indir)

Additionally, each wpdRes has an optionalresByArg field, whichdescribes the resolutions for calls with all constant integer arguments:

resByArg: (ResByArg[, ResByArg]*)

where ResByArg is:

args: (Arg[, Arg]*), byArg: (kind: UniformRetVal[, info: 0][, byte: 0][, bit: 0])

Where thekind can beIndir,UniformRetVal,UniqueRetValorVirtualConstProp. Theinfo field is only used if the kindisUniformRetVal (indicates the uniform return value), orUniqueRetVal (holds the return value associated with the unique vtable(0 or 1)). Thebyte andbit fields are only used if the target doesnot support the use of absolute symbols to store constants.

Intrinsic Global Variables

LLVM has a number of “magic” global variables that contain data thataffect code generation or other IR semantics. These are documented here.All globals of this sort should have a section specified as“llvm.metadata”. This section and all globals that start with“llvm.” are reserved for use by LLVM.

The ‘llvm.used’ Global Variable

The@llvm.used global is an array which hasappending linkage. This array contains a list ofpointers to named global variables, functions and aliases which may optionallyhave a pointer cast formed of bitcast or getelementptr. For example, a legaluse of it is:

@X=globali84@Y=globali32123@llvm.used=appendingglobal[2xptr][ptr@X,ptr@Y],section"llvm.metadata"

If a symbol appears in the@llvm.used list, then the compiler, assembler,and linker are required to treat the symbol as if there is a reference to thesymbol that it cannot see (which is why they have to be named). For example, ifa variable has internal linkage and no references other than that from the@llvm.used list, it cannot be deleted. This is commonly used to representreferences from inline asms and other things the compiler cannot “see”, andcorresponds to “attribute((used))” in GNU C.

On some targets, the code generator must emit a directive to theassembler or object file to prevent the assembler and linker fromremoving the symbol.

The ‘llvm.compiler.used’ Global Variable

The@llvm.compiler.used directive is the same as the@llvm.useddirective, except that it only prevents the compiler from touching thesymbol. On targets that support it, this allows an intelligent linker tooptimize references to the symbol without being impeded as it would beby@llvm.used.

This is a rare construct that should only be used in rare circumstances,and should not be exposed to source languages.

The ‘llvm.global_ctors’ Global Variable

%0=type{i32,ptr,ptr}@llvm.global_ctors=appendingglobal[1x%0][%0{i3265535,ptr@ctor,ptr@data}]

The@llvm.global_ctors array contains a list of constructorfunctions, priorities, and an associated global or function.The functions referenced by this array will be called in ascending orderof priority (i.e. lowest first) when the module is loaded. The order offunctions with the same priority is not defined.

If the third field is non-null, and points to a global variableor function, the initializer function will only run if the associateddata from the current module is not discarded.On ELF the referenced global variable or function must be in a comdat.

The ‘llvm.global_dtors’ Global Variable

%0=type{i32,ptr,ptr}@llvm.global_dtors=appendingglobal[1x%0][%0{i3265535,ptr@dtor,ptr@data}]

The@llvm.global_dtors array contains a list of destructorfunctions, priorities, and an associated global or function.The functions referenced by this array will be called in descendingorder of priority (i.e. highest first) when the module is unloaded. Theorder of functions with the same priority is not defined.

If the third field is non-null, and points to a global variableor function, the destructor function will only run if the associateddata from the current module is not discarded.On ELF the referenced global variable or function must be in a comdat.

Instruction Reference

The LLVM instruction set consists of several different classificationsof instructions:terminator instructions,binaryinstructions,bitwise binaryinstructions,memory instructions, andother instructions. There are alsodebug records, which are not instructions themselves but are printedinterleaved with instructions to describe changes in the state of the program’sdebug information at each position in the program’s execution.

Terminator Instructions

As mentionedpreviously, every basic block in aprogram ends with a “Terminator” instruction, which indicates whichblock should be executed after the current block is finished. Theseterminator instructions typically yield a ‘void’ value: they producecontrol flow, not values (the one exception being the‘invoke’ instruction).

The terminator instructions are: ‘ret’,‘br’, ‘switch’,‘indirectbr’, ‘invoke’,‘callbr’‘resume’, ‘catchswitch’,‘catchret’,‘cleanupret’,and ‘unreachable’.

ret’ Instruction

Syntax:
ret<type><value>;Returnavaluefromanon-voidfunctionretvoid;Returnfromvoidfunction
Overview:

The ‘ret’ instruction is used to return control flow (and optionallya value) from a function back to the caller.

There are two forms of the ‘ret’ instruction: one that returns avalue and then causes control flow, and one that just causes controlflow to occur.

Arguments:

The ‘ret’ instruction optionally accepts a single argument, thereturn value. The type of the return value must be a ‘firstclass’ type.

A function is notwell formed if it has a non-voidreturn type and contains a ‘ret’ instruction with no return value ora return value with a type that does not match its type, or if it has avoid return type and contains a ‘ret’ instruction with a returnvalue.

Semantics:

When the ‘ret’ instruction is executed, control flow returns back tothe calling function’s context. If the caller is a“call” instruction, execution continues at theinstruction after the call. If the caller was an“invoke” instruction, execution continues at thebeginning of the “normal” destination block. If the instruction returnsa value, that value shall set the call or invoke instruction’s returnvalue.

Example:
reti325; Return an integer value of 5retvoid; Return from a void functionret{i32,i8}{i324,i82}; Return a struct of values 4 and 2

br’ Instruction

Syntax:
bri1<cond>,label<iftrue>,label<iffalse>brlabel<dest>;Unconditionalbranch
Overview:

The ‘br’ instruction is used to cause control flow to transfer to adifferent basic block in the current function. There are two forms ofthis instruction, corresponding to a conditional branch and anunconditional branch.

Arguments:

The conditional branch form of the ‘br’ instruction takes a single‘i1’ value and two ‘label’ values. The unconditional form of the‘br’ instruction takes a single ‘label’ value as a target.

Semantics:

Upon execution of a conditional ‘br’ instruction, the ‘i1’argument is evaluated. If the value istrue, control flows to the‘iftruelabel argument. If “cond” isfalse, control flowsto the ‘iffalselabel argument.If ‘cond’ ispoison orundef, this instruction has undefinedbehavior.

Example:
Test:%cond=icmpeqi32%a,%bbri1%cond,label%IfEqual,label%IfUnequalIfEqual:reti321IfUnequal:reti320

switch’ Instruction

Syntax:
switch<intty><value>,label<defaultdest>[<intty><val>,label<dest>...]
Overview:

The ‘switch’ instruction is used to transfer control flow to one ofseveral different places. It is a generalization of the ‘br’instruction, allowing a branch to occur to one of many possibledestinations.

Arguments:

The ‘switch’ instruction uses three parameters: an integercomparison value ‘value’, a default ‘label’ destination, and anarray of pairs of comparison value constants and ‘label’s. The tableis not allowed to contain duplicate constant entries.

Semantics:

Theswitch instruction specifies a table of values and destinations.When the ‘switch’ instruction is executed, this table is searchedfor the given value. If the value is found, control flow is transferredto the corresponding destination; otherwise, control flow is transferredto the default destination.If ‘value’ ispoison orundef, this instruction has undefinedbehavior.

Implementation:

Depending on properties of the target machine and the particularswitch instruction, this instruction may be code generated indifferent ways. For example, it could be generated as a series ofchained conditional branches or with a lookup table.

Example:
; Emulate a conditional br instruction%Val=zexti1%valuetoi32switchi32%Val,label%truedest[i320,label%falsedest]; Emulate an unconditional br instructionswitchi320,label%dest[]; Implement a jump table:switchi32%val,label%otherwise[i320,label%onzeroi321,label%ononei322,label%ontwo]

indirectbr’ Instruction

Syntax:
indirectbrptr<address>,[label<dest1>,label<dest2>,...]
Overview:

The ‘indirectbr’ instruction implements an indirect branch to alabel within the current function, whose address is specified by“address”. Address must be derived from ablockaddress constant.

Arguments:

The ‘address’ argument is the address of the label to jump to. Therest of the arguments indicate the full set of possible destinationsthat the address may point to. Blocks are allowed to occur multipletimes in the destination list, though this isn’t particularly useful.

This destination list is required so that dataflow analysis has anaccurate understanding of the CFG.

Semantics:

Control transfers to the block specified in the address argument. Allpossible destination blocks must be listed in the label list, otherwisethis instruction has undefined behavior. This implies that jumps tolabels defined in other functions have undefined behavior as well.If ‘address’ ispoison orundef, this instruction has undefinedbehavior.

Implementation:

This is typically implemented with a jump through a register.

Example:
indirectbrptr%Addr,[label%bb1,label%bb2,label%bb3]

invoke’ Instruction

Syntax:
<result>=invoke[cconv][retattrs][addrspace(<num>)]<ty>|<fnty><fnptrval>(<functionargs>)[fnattrs][operandbundles]tolabel<normallabel>unwindlabel<exceptionlabel>
Overview:

The ‘invoke’ instruction causes control to transfer to a specifiedfunction, with the possibility of control flow transfer to either the‘normal’ label or the ‘exception’ label. If the callee functionreturns with the “ret” instruction, control flow will return to the“normal” label. If the callee (or any indirect callees) returns via the“resume” instruction or other exception handlingmechanism, control is interrupted and continued at the dynamicallynearest “exception” label.

The ‘exception’ label is alandingpad for the exception. As such,‘exception’ label is required to have the“landingpad” instruction, which contains theinformation about the behavior of the program after unwinding happens,as its first non-PHI instruction. The restrictions on the“landingpad” instruction’s tightly couples it to the “invoke”instruction, so that the important information contained within the“landingpad” instruction can’t be lost through normal code motion.

Arguments:

This instruction requires several arguments:

  1. The optional “cconv” marker indicates whichcallingconvention the call should use. If none isspecified, the call defaults to using C calling conventions.

  2. The optionalParameter Attributes list for returnvalues. Only ‘zeroext’, ‘signext’, ‘noext’, and ‘inreg’attributes are valid here.

  3. The optional addrspace attribute can be used to indicate the address spaceof the called function. If it is not specified, the program address spacefrom thedatalayout string will be used.

  4. ty’: the type of the call instruction itself which is also thetype of the return value. Functions that return no value are markedvoid. The signature is computed based on the return type and argumenttypes.

  5. fnty’: shall be the signature of the function being invoked. Theargument types must match the types implied by this signature. Thisis only required if the signature specifies a varargs type.

  6. fnptrval’: An LLVM value containing a pointer to a function tobe invoked. In most cases, this is a direct function invocation, butindirectinvoke’s are just as possible, calling an arbitrary pointerto function value.

  7. functionargs’: argument list whose types match the functionsignature argument types and parameter attributes. All arguments mustbe offirst class type. If the function signatureindicates the function accepts a variable number of arguments, theextra arguments can be specified.

  8. normallabel’: the label reached when the called functionexecutes a ‘ret’ instruction.

  9. exceptionlabel’: the label reached when a callee returns viatheresume instruction or other exception handlingmechanism.

  10. The optionalfunction attributes list.

  11. The optionaloperand bundles list.

Semantics:

This instruction is designed to operate as a standard ‘call’instruction in most regards. The primary difference is that itestablishes an association with a label, which is used by the runtimelibrary to unwind the stack.

This instruction is used in languages with destructors to ensure thatproper cleanup is performed in the case of either alongjmp or athrown exception. Additionally, this is important for implementation of‘catch’ clauses in high-level languages that support them.

For the purposes of the SSA form, the definition of the value returnedby the ‘invoke’ instruction is deemed to occur on the edge from thecurrent block to the “normal” label. If the callee unwinds then noreturn value is available.

Example:
%retval=invokei32@Test(i3215)tolabel%Continueunwindlabel%TestCleanup; i32:retval set%retval=invokecoldcci32%Testfnptr(i3215)tolabel%Continueunwindlabel%TestCleanup; i32:retval set

callbr’ Instruction

Syntax:
<result>=callbr[cconv][retattrs][addrspace(<num>)]<ty>|<fnty><fnptrval>(<functionargs>)[fnattrs][operandbundles]tolabel<fallthroughlabel>[indirectlabels]
Overview:

The ‘callbr’ instruction causes control to transfer to a specifiedfunction, with the possibility of control flow transfer to either the‘fallthrough’ label or one of the ‘indirect’ labels.

This instruction should only be used to implement the “goto” feature of gccstyle inline assembly. Any other usage is an error in the IR verifier.

Note that in order to support outputs along indirect edges, LLVM may need tosplit critical edges, which may require synthesizing a replacement block fortheindirectlabels. Therefore, the address of a label as seen by anothercallbr instruction, or for ablockaddress constant,may not be equal to the address provided for the same block to thisinstruction’sindirectlabels operand. The assembly code may only transfercontrol to addresses provided via this instruction’sindirectlabels.

On target architectures that implement branch target enforcement by requiringindirect (register-controlled) branch instructions to jump only to locationsmarked by a special instruction (such as AArch64bti), the called code isexpected not to use such an indirect branch to transfer control to thelocations inindirectlabels. Therefore, including a label in theindirectlabels of acallbr does not require the compiler to put abti or equivalent instruction at the label.

Arguments:

This instruction requires several arguments:

  1. The optional “cconv” marker indicates whichcallingconvention the call should use. If none isspecified, the call defaults to using C calling conventions.

  2. The optionalParameter Attributes list for returnvalues. Only ‘zeroext’, ‘signext’, ‘noext’, and ‘inreg’attributes are valid here.

  3. The optional addrspace attribute can be used to indicate the address spaceof the called function. If it is not specified, the program address spacefrom thedatalayout string will be used.

  4. ty’: the type of the call instruction itself which is also thetype of the return value. Functions that return no value are markedvoid. The signature is computed based on the return type and argumenttypes.

  5. fnty’: shall be the signature of the function being called. Theargument types must match the types implied by this signature. Thisis only required if the signature specifies a varargs type.

  6. fnptrval’: An LLVM value containing a pointer to a function tobe called. In most cases, this is a direct function call, butothercallbr’s are just as possible, calling an arbitrary pointerto function value.

  7. functionargs’: argument list whose types match the functionsignature argument types and parameter attributes. All arguments mustbe offirst class type. If the function signatureindicates the function accepts a variable number of arguments, theextra arguments can be specified.

  8. fallthroughlabel’: the label reached when the inline assembly’sexecution exits the bottom.

  9. indirectlabels’: the labels reached when a callee transfers controlto a location other than the ‘fallthroughlabel’. Label constraintsrefer to these destinations.

  10. The optionalfunction attributes list.

  11. The optionaloperand bundles list.

Semantics:

This instruction is designed to operate as a standard ‘call’instruction in most regards. The primary difference is that itestablishes an association with additional labels to define where controlflow goes after the call.

The output values of a ‘callbr’ instruction are available both in thethe ‘fallthrough’ block, and any ‘indirect’ blocks(s).

The only use of this today is to implement the “goto” feature of gcc inlineassembly where additional labels can be provided as locations for the inlineassembly to jump to.

Example:
; "asm goto" without output constraints.callbrvoidasm"","r,!i"(i32%x)tolabel%fallthrough[label%indirect]; "asm goto" with output constraints.<result>=callbri32asm"","=r,r,!i"(i32%x)tolabel%fallthrough[label%indirect]

resume’ Instruction

Syntax:
resume<type><value>
Overview:

The ‘resume’ instruction is a terminator instruction that has nosuccessors.

Arguments:

The ‘resume’ instruction requires one argument, which must have thesame type as the result of any ‘landingpad’ instruction in the samefunction.

Semantics:

The ‘resume’ instruction resumes propagation of an existing(in-flight) exception whose unwinding was interrupted with alandingpad instruction.

Example:
resume{ptr,i32}%exn

catchswitch’ Instruction

Syntax:
<resultval>=catchswitchwithin<parent>[label<handler1>,label<handler2>,...]unwindtocaller<resultval>=catchswitchwithin<parent>[label<handler1>,label<handler2>,...]unwindlabel<default>
Overview:

The ‘catchswitch’ instruction is used byLLVM’s exception handling system to describe the set of possible catch handlersthat may be executed by theEH personality routine.

Arguments:

Theparent argument is the token of the funclet that contains thecatchswitch instruction. If thecatchswitch is not inside a funclet,this operand may be the tokennone.

Thedefault argument is the label of another basic block beginning witheither acleanuppad orcatchswitch instruction. This unwind destinationmust be a legal target with respect to theparent links, as described intheexception handling documentation.

Thehandlers are a nonempty list of successor blocks that each begin with acatchpad instruction.

Semantics:

Executing this instruction transfers control to one of the successors inhandlers, if appropriate, or continues to unwind via the unwind label ifpresent.

Thecatchswitch is both a terminator and a “pad” instruction, meaning thatit must be both the first non-phi instruction and last instruction in the basicblock. Therefore, it must be the only non-phi instruction in the block.

Example:
dispatch1:  %cs1 = catchswitch within none [label %handler0, label %handler1] unwind to callerdispatch2:  %cs2 = catchswitch within %parenthandler [label %handler0] unwind label %cleanup

catchret’ Instruction

Syntax:
catchretfrom<token>tolabel<normal>
Overview:

The ‘catchret’ instruction is a terminator instruction that has asingle successor.

Arguments:

The first argument to a ‘catchret’ indicates whichcatchpad itexits. It must be acatchpad.The second argument to a ‘catchret’ specifies where control willtransfer to next.

Semantics:

The ‘catchret’ instruction ends an existing (in-flight) exception whoseunwinding was interrupted with acatchpad instruction. Thepersonality function gets a chance to execute arbitrarycode to, for example, destroy the active exception. Control then transfers tonormal.

Thetoken argument must be a token produced by acatchpad instruction.If the specifiedcatchpad is not the most-recently-entered not-yet-exitedfunclet pad (as described in theEH documentation),thecatchret’s behavior is undefined.

Example:
catchret from %catch to label %continue

cleanupret’ Instruction

Syntax:
cleanupretfrom<value>unwindlabel<continue>cleanupretfrom<value>unwindtocaller
Overview:

The ‘cleanupret’ instruction is a terminator instruction that hasan optional successor.

Arguments:

The ‘cleanupret’ instruction requires one argument, which indicateswhichcleanuppad it exits, and must be acleanuppad.If the specifiedcleanuppad is not the most-recently-entered not-yet-exitedfunclet pad (as described in theEH documentation),thecleanupret’s behavior is undefined.

The ‘cleanupret’ instruction also has an optional successor,continue,which must be the label of another basic block beginning with either acleanuppad orcatchswitch instruction. This unwind destination mustbe a legal target with respect to theparent links, as described in theexception handling documentation.

Semantics:

The ‘cleanupret’ instruction indicates to thepersonality function that onecleanuppad it transferred control to has ended.It transfers control tocontinue or unwinds out of the function.

Example:
cleanupret from %cleanup unwind to callercleanupret from %cleanup unwind label %continue

unreachable’ Instruction

Syntax:
unreachable
Overview:

The ‘unreachable’ instruction has no defined semantics. Thisinstruction is used to inform the optimizer that a particular portion ofthe code is not reachable. This can be used to indicate that the codeafter a no-return function cannot be reached, and other facts.

Semantics:

The ‘unreachable’ instruction has no defined semantics.

Unary Operations

Unary operators require a single operand, execute an operation onit, and produce a single value. The operand might represent multipledata, as is the case with thevector data type. Theresult value has the same type as its operand.

fneg’ Instruction

Syntax:
<result>=fneg[fast-mathflags]*<ty><op1>;yieldsty:result
Overview:

The ‘fneg’ instruction returns the negation of its operand.

Arguments:

The argument to the ‘fneg’ instruction must be afloating-point orvector offloating-point values.

Semantics:

The value produced is a copy of the operand with its sign bit flipped.The value is otherwise completely identical; in particular, if the input is aNaN, then the quiet/signaling bit and payload are perfectly preserved.

This instruction can also take any number offast-mathflags, which are optimization hints to enable otherwiseunsafe floating-point optimizations:

Example:
<result> = fneg float %val          ; yields float:result = -%var

Binary Operations

Binary operators are used to do most of the computation in a program.They require two operands of the same type, execute an operation onthem, and produce a single value. The operands might represent multipledata, as is the case with thevector data type. Theresult value has the same type as its operands.

There are several different binary operators:

add’ Instruction

Syntax:
<result>=add<ty><op1>,<op2>;yieldsty:result<result>=addnuw<ty><op1>,<op2>;yieldsty:result<result>=addnsw<ty><op1>,<op2>;yieldsty:result<result>=addnuwnsw<ty><op1>,<op2>;yieldsty:result
Overview:

The ‘add’ instruction returns the sum of its two operands.

Arguments:

The two arguments to the ‘add’ instruction must beinteger orvector of integer values. Botharguments must have identical types.

Semantics:

The value produced is the integer sum of the two operands.

If the sum has unsigned overflow, the result returned is themathematical result modulo 2n, where n is the bit width ofthe result.

Because LLVM integers use a two’s complement representation, thisinstruction is appropriate for both signed and unsigned integers.

nuw andnsw stand for “No Unsigned Wrap” and “No Signed Wrap”,respectively. If thenuw and/ornsw keywords are present, theresult value of theadd is apoison value ifunsigned and/or signed overflow, respectively, occurs.

Example:
<result> = add i32 4, %var          ; yields i32:result = 4 + %var

fadd’ Instruction

Syntax:
<result>=fadd[fast-mathflags]*<ty><op1>,<op2>;yieldsty:result
Overview:

The ‘fadd’ instruction returns the sum of its two operands.

Arguments:

The two arguments to the ‘fadd’ instruction must befloating-point orvector offloating-point values. Both arguments must have identical types.

Semantics:

The value produced is the floating-point sum of the two operands.This instruction is assumed to execute in the defaultfloating-pointenvironment.This instruction can also take any number offast-mathflags, which are optimization hints to enable otherwiseunsafe floating-point optimizations:

Example:
<result> = fadd float 4.0, %var          ; yields float:result = 4.0 + %var

sub’ Instruction

Syntax:
<result>=sub<ty><op1>,<op2>;yieldsty:result<result>=subnuw<ty><op1>,<op2>;yieldsty:result<result>=subnsw<ty><op1>,<op2>;yieldsty:result<result>=subnuwnsw<ty><op1>,<op2>;yieldsty:result
Overview:

The ‘sub’ instruction returns the difference of its two operands.

Note that the ‘sub’ instruction is used to represent the ‘neg’instruction present in most other intermediate representations.

Arguments:

The two arguments to the ‘sub’ instruction must beinteger orvector of integer values. Botharguments must have identical types.

Semantics:

The value produced is the integer difference of the two operands.

If the difference has unsigned overflow, the result returned is themathematical result modulo 2n, where n is the bit width ofthe result.

Because LLVM integers use a two’s complement representation, thisinstruction is appropriate for both signed and unsigned integers.

nuw andnsw stand for “No Unsigned Wrap” and “No Signed Wrap”,respectively. If thenuw and/ornsw keywords are present, theresult value of thesub is apoison value ifunsigned and/or signed overflow, respectively, occurs.

Example:
<result> = sub i32 4, %var          ; yields i32:result = 4 - %var<result> = sub i32 0, %val          ; yields i32:result = -%var

fsub’ Instruction

Syntax:
<result>=fsub[fast-mathflags]*<ty><op1>,<op2>;yieldsty:result
Overview:

The ‘fsub’ instruction returns the difference of its two operands.

Arguments:

The two arguments to the ‘fsub’ instruction must befloating-point orvector offloating-point values. Both arguments must have identical types.

Semantics:

The value produced is the floating-point difference of the two operands.This instruction is assumed to execute in the defaultfloating-pointenvironment.This instruction can also take any number offast-mathflags, which are optimization hints to enable otherwiseunsafe floating-point optimizations:

Example:
<result> = fsub float 4.0, %var           ; yields float:result = 4.0 - %var<result> = fsub float -0.0, %val          ; yields float:result = -%var

mul’ Instruction

Syntax:
<result>=mul<ty><op1>,<op2>;yieldsty:result<result>=mulnuw<ty><op1>,<op2>;yieldsty:result<result>=mulnsw<ty><op1>,<op2>;yieldsty:result<result>=mulnuwnsw<ty><op1>,<op2>;yieldsty:result
Overview:

The ‘mul’ instruction returns the product of its two operands.

Arguments:

The two arguments to the ‘mul’ instruction must beinteger orvector of integer values. Botharguments must have identical types.

Semantics:

The value produced is the integer product of the two operands.

If the result of the multiplication has unsigned overflow, the resultreturned is the mathematical result modulo 2n, where n is thebit width of the result.

Because LLVM integers use a two’s complement representation, and theresult is the same width as the operands, this instruction returns thecorrect result for both signed and unsigned integers. If a full product(e.g.i32 *i32 ->i64) is needed, the operands should besign-extended or zero-extended as appropriate to the width of the fullproduct.

nuw andnsw stand for “No Unsigned Wrap” and “No Signed Wrap”,respectively. If thenuw and/ornsw keywords are present, theresult value of themul is apoison value ifunsigned and/or signed overflow, respectively, occurs.

Example:
<result> = mul i32 4, %var          ; yields i32:result = 4 * %var

fmul’ Instruction

Syntax:
<result>=fmul[fast-mathflags]*<ty><op1>,<op2>;yieldsty:result
Overview:

The ‘fmul’ instruction returns the product of its two operands.

Arguments:

The two arguments to the ‘fmul’ instruction must befloating-point orvector offloating-point values. Both arguments must have identical types.

Semantics:

The value produced is the floating-point product of the two operands.This instruction is assumed to execute in the defaultfloating-pointenvironment.This instruction can also take any number offast-mathflags, which are optimization hints to enable otherwiseunsafe floating-point optimizations:

Example:
<result> = fmul float 4.0, %var          ; yields float:result = 4.0 * %var

udiv’ Instruction

Syntax:
<result>=udiv<ty><op1>,<op2>;yieldsty:result<result>=udivexact<ty><op1>,<op2>;yieldsty:result
Overview:

The ‘udiv’ instruction returns the quotient of its two operands.

Arguments:

The two arguments to the ‘udiv’ instruction must beinteger orvector of integer values. Botharguments must have identical types.

Semantics:

The value produced is the unsigned integer quotient of the two operands.

Note that unsigned integer division and signed integer division aredistinct operations; for signed integer division, use ‘sdiv’.

Division by zero is undefined behavior. For vectors, if any elementof the divisor is zero, the operation has undefined behavior.

If theexact keyword is present, the result value of theudiv isapoison value if %op1 is not a multiple of %op2 (assuch, “((a udiv exact b) mul b) == a”).

Example:
<result> = udiv i32 4, %var          ; yields i32:result = 4 / %var

sdiv’ Instruction

Syntax:
<result>=sdiv<ty><op1>,<op2>;yieldsty:result<result>=sdivexact<ty><op1>,<op2>;yieldsty:result
Overview:

The ‘sdiv’ instruction returns the quotient of its two operands.

Arguments:

The two arguments to the ‘sdiv’ instruction must beinteger orvector of integer values. Botharguments must have identical types.

Semantics:

The value produced is the signed integer quotient of the two operandsrounded towards zero.

Note that signed integer division and unsigned integer division aredistinct operations; for unsigned integer division, use ‘udiv’.

Division by zero is undefined behavior. For vectors, if any elementof the divisor is zero, the operation has undefined behavior.Overflow also leads to undefined behavior; this is a rare case, but canoccur, for example, by doing a 32-bit division of -2147483648 by -1.

If theexact keyword is present, the result value of thesdiv isapoison value if the result would be rounded.

Example:
<result> = sdiv i32 4, %var          ; yields i32:result = 4 / %var

fdiv’ Instruction

Syntax:
<result>=fdiv[fast-mathflags]*<ty><op1>,<op2>;yieldsty:result
Overview:

The ‘fdiv’ instruction returns the quotient of its two operands.

Arguments:

The two arguments to the ‘fdiv’ instruction must befloating-point orvector offloating-point values. Both arguments must have identical types.

Semantics:

The value produced is the floating-point quotient of the two operands.This instruction is assumed to execute in the defaultfloating-pointenvironment.This instruction can also take any number offast-mathflags, which are optimization hints to enable otherwiseunsafe floating-point optimizations:

Example:
<result> = fdiv float 4.0, %var          ; yields float:result = 4.0 / %var

urem’ Instruction

Syntax:
<result>=urem<ty><op1>,<op2>;yieldsty:result
Overview:

The ‘urem’ instruction returns the remainder from the unsigneddivision of its two arguments.

Arguments:

The two arguments to the ‘urem’ instruction must beinteger orvector of integer values. Botharguments must have identical types.

Semantics:

This instruction returns the unsigned integerremainder of a division.This instruction always performs an unsigned division to get theremainder.

Note that unsigned integer remainder and signed integer remainder aredistinct operations; for signed integer remainder, use ‘srem’.

Taking the remainder of a division by zero is undefined behavior.For vectors, if any element of the divisor is zero, the operation hasundefined behavior.

Example:
<result> = urem i32 4, %var          ; yields i32:result = 4 % %var

srem’ Instruction

Syntax:
<result>=srem<ty><op1>,<op2>;yieldsty:result
Overview:

The ‘srem’ instruction returns the remainder from the signeddivision of its two operands. This instruction can also takevector versions of the values in which case the elementsmust be integers.

Arguments:

The two arguments to the ‘srem’ instruction must beinteger orvector of integer values. Botharguments must have identical types.

Semantics:

This instruction returns theremainder of a division (where the resultis either zero or has the same sign as the dividend,op1), not themodulo operator (where the result is either zero or has the same signas the divisor,op2) of a value. For more information about thedifference, seeThe MathForum. For atable of how this is implemented in various languages, please seeWikipedia: modulooperation.

Note that signed integer remainder and unsigned integer remainder aredistinct operations; for unsigned integer remainder, use ‘urem’.

Taking the remainder of a division by zero is undefined behavior.For vectors, if any element of the divisor is zero, the operation hasundefined behavior.Overflow also leads to undefined behavior; this is a rare case, but canoccur, for example, by taking the remainder of a 32-bit division of-2147483648 by -1. (The remainder doesn’t actually overflow, but thisrule lets srem be implemented using instructions that return both theresult of the division and the remainder.)

Example:
<result> = srem i32 4, %var          ; yields i32:result = 4 % %var

frem’ Instruction

Syntax:
<result>=frem[fast-mathflags]*<ty><op1>,<op2>;yieldsty:result
Overview:

The ‘frem’ instruction returns the remainder from the division ofits two operands.

Note

The instruction is implemented as a call to libm’s ‘fmod’for some targets, and using the instruction may thus require linking libm.

Arguments:

The two arguments to the ‘frem’ instruction must befloating-point orvector offloating-point values. Both arguments must have identical types.

Semantics:

The value produced is the floating-point remainder of the two operands.This is the same output as a libm ‘fmod’ function, but without anypossibility of settingerrno. The remainder has the same sign as thedividend.This instruction is assumed to execute in the defaultfloating-pointenvironment.This instruction can also take any number offast-mathflags, which are optimization hints to enable otherwiseunsafe floating-point optimizations:

Example:
<result> = frem float 4.0, %var          ; yields float:result = 4.0 % %var

Bitwise Binary Operations

Bitwise binary operators are used to do various forms of bit-twiddlingin a program. They are generally very efficient instructions and cancommonly be strength reduced from other instructions. They require twooperands of the same type, execute an operation on them, and produce asingle value. The resulting value is the same type as its operands.

shl’ Instruction

Syntax:
<result>=shl<ty><op1>,<op2>;yieldsty:result<result>=shlnuw<ty><op1>,<op2>;yieldsty:result<result>=shlnsw<ty><op1>,<op2>;yieldsty:result<result>=shlnuwnsw<ty><op1>,<op2>;yieldsty:result
Overview:

The ‘shl’ instruction returns the first operand shifted to the lefta specified number of bits.

Arguments:

Both arguments to the ‘shl’ instruction must be the sameinteger orvector of integer type.‘op2’ is treated as an unsigned value.

Semantics:

The value produced isop1 * 2op2 mod 2n,wheren is the width of the result. Ifop2 is (statically ordynamically) equal to or larger than the number of bits inop1, this instruction returns apoison value.If the arguments are vectors, each vector element ofop1 is shiftedby the corresponding shift amount inop2.

If thenuw keyword is present, then the shift produces a poisonvalue if it shifts out any non-zero bits.If thensw keyword is present, then the shift produces a poisonvalue if it shifts out any bits that disagree with the resultant sign bit.

Example:
<result> = shl i32 4, %var   ; yields i32: 4 << %var<result> = shl i32 4, 2      ; yields i32: 16<result> = shl i32 1, 10     ; yields i32: 1024<result> = shl i32 1, 32     ; undefined<result> = shl <2 x i32> < i32 1, i32 1>, < i32 1, i32 2>   ; yields: result=<2 x i32> < i32 2, i32 4>

lshr’ Instruction

Syntax:
<result>=lshr<ty><op1>,<op2>;yieldsty:result<result>=lshrexact<ty><op1>,<op2>;yieldsty:result
Overview:

The ‘lshr’ instruction (logical shift right) returns the firstoperand shifted to the right a specified number of bits with zero fill.

Arguments:

Both arguments to the ‘lshr’ instruction must be the sameinteger orvector of integer type.‘op2’ is treated as an unsigned value.

Semantics:

This instruction always performs a logical shift right operation. Themost significant bits of the result will be filled with zero bits afterthe shift. Ifop2 is (statically or dynamically) equal to or largerthan the number of bits inop1, this instruction returns apoisonvalue. If the arguments are vectors, each vector elementofop1 is shifted by the corresponding shift amount inop2.

If theexact keyword is present, the result value of thelshr isa poison value if any of the bits shifted out are non-zero.

Example:
<result> = lshr i32 4, 1   ; yields i32:result = 2<result> = lshr i32 4, 2   ; yields i32:result = 1<result> = lshr i8  4, 3   ; yields i8:result = 0<result> = lshr i8 -2, 1   ; yields i8:result = 0x7F<result> = lshr i32 1, 32  ; undefined<result> = lshr <2 x i32> < i32 -2, i32 4>, < i32 1, i32 2>   ; yields: result=<2 x i32> < i32 0x7FFFFFFF, i32 1>

ashr’ Instruction

Syntax:
<result>=ashr<ty><op1>,<op2>;yieldsty:result<result>=ashrexact<ty><op1>,<op2>;yieldsty:result
Overview:

The ‘ashr’ instruction (arithmetic shift right) returns the firstoperand shifted to the right a specified number of bits with signextension.

Arguments:

Both arguments to the ‘ashr’ instruction must be the sameinteger orvector of integer type.‘op2’ is treated as an unsigned value.

Semantics:

This instruction always performs an arithmetic shift right operation,The most significant bits of the result will be filled with the sign bitofop1. Ifop2 is (statically or dynamically) equal to or largerthan the number of bits inop1, this instruction returns apoisonvalue. If the arguments are vectors, each vector elementofop1 is shifted by the corresponding shift amount inop2.

If theexact keyword is present, the result value of theashr isa poison value if any of the bits shifted out are non-zero.

Example:
<result> = ashr i32 4, 1   ; yields i32:result = 2<result> = ashr i32 4, 2   ; yields i32:result = 1<result> = ashr i8  4, 3   ; yields i8:result = 0<result> = ashr i8 -2, 1   ; yields i8:result = -1<result> = ashr i32 1, 32  ; undefined<result> = ashr <2 x i32> < i32 -2, i32 4>, < i32 1, i32 3>   ; yields: result=<2 x i32> < i32 -1, i32 0>

and’ Instruction

Syntax:
<result>=and<ty><op1>,<op2>;yieldsty:result
Overview:

The ‘and’ instruction returns the bitwise logical and of its twooperands.

Arguments:

The two arguments to the ‘and’ instruction must beinteger orvector of integer values. Botharguments must have identical types.

Semantics:

The truth table used for the ‘and’ instruction is:

In0

In1

Out

0

0

0

0

1

0

1

0

0

1

1

1

Example:
<result> = and i32 4, %var         ; yields i32:result = 4 & %var<result> = and i32 15, 40          ; yields i32:result = 8<result> = and i32 4, 8            ; yields i32:result = 0

or’ Instruction

Syntax:
<result>=or<ty><op1>,<op2>;yieldsty:result<result>=ordisjoint<ty><op1>,<op2>;yieldsty:result
Overview:

The ‘or’ instruction returns the bitwise logical inclusive or of itstwo operands.

Arguments:

The two arguments to the ‘or’ instruction must beinteger orvector of integer values. Botharguments must have identical types.

Semantics:

The truth table used for the ‘or’ instruction is:

In0

In1

Out

0

0

0

0

1

1

1

0

1

1

1

1

disjoint means that for each bit, that bit is zero in at least one of theinputs. This allows the Or to be treated as an Add since no carry can occur fromany bit. If the disjoint keyword is present, the result value of theor is apoison value if both inputs have a one in the same bitposition. For vectors, only the element containing the bit is poison.

Example:
<result>=ori324,%var;yieldsi32:result=4|%var<result>=ori3215,40;yieldsi32:result=47<result>=ori324,8;yieldsi32:result=12

xor’ Instruction

Syntax:
<result>=xor<ty><op1>,<op2>;yieldsty:result
Overview:

The ‘xor’ instruction returns the bitwise logical exclusive or ofits two operands. Thexor is used to implement the “one’scomplement” operation, which is the “~” operator in C.

Arguments:

The two arguments to the ‘xor’ instruction must beinteger orvector of integer values. Botharguments must have identical types.

Semantics:

The truth table used for the ‘xor’ instruction is:

In0

In1

Out

0

0

0

0

1

1

1

0

1

1

1

0

Example:
<result> = xor i32 4, %var         ; yields i32:result = 4 ^ %var<result> = xor i32 15, 40          ; yields i32:result = 39<result> = xor i32 4, 8            ; yields i32:result = 12<result> = xor i32 %V, -1          ; yields i32:result = ~%V

Vector Operations

LLVM supports several instructions to represent vector operations in atarget-independent manner. These instructions cover the element-accessand vector-specific operations needed to process vectors effectively.While LLVM does directly support these vector operations, manysophisticated algorithms will want to use target-specific intrinsics totake full advantage of a specific target.

extractelement’ Instruction

Syntax:
<result>=extractelement<nx<ty>><val>,<ty2><idx>;yields<ty><result>=extractelement<vscalexnx<ty>><val>,<ty2><idx>;yields<ty>
Overview:

The ‘extractelement’ instruction extracts a single scalar elementfrom a vector at a specified index.

Arguments:

The first operand of an ‘extractelement’ instruction is a value ofvector type. The second operand is an index indicatingthe position from which to extract the element. The index may be avariable of any integer type, and will be treated as an unsigned integer.

Semantics:

The result is a scalar of the same type as the element type ofval.Its value is the value at positionidx ofval. Ifidxexceeds the length ofval for a fixed-length vector, the result is apoison value. For a scalable vector, if the valueofidx exceeds the runtime length of the vector, the result is apoison value.

Example:
<result> = extractelement <4 x i32> %vec, i32 0    ; yields i32

insertelement’ Instruction

Syntax:
<result>=insertelement<nx<ty>><val>,<ty><elt>,<ty2><idx>;yields<nx<ty>><result>=insertelement<vscalexnx<ty>><val>,<ty><elt>,<ty2><idx>;yields<vscalexnx<ty>>
Overview:

The ‘insertelement’ instruction inserts a scalar element into avector at a specified index.

Arguments:

The first operand of an ‘insertelement’ instruction is a value ofvector type. The second operand is a scalar value whosetype must equal the element type of the first operand. The third operandis an index indicating the position at which to insert the value. Theindex may be a variable of any integer type, and will be treated as anunsigned integer.

Semantics:

The result is a vector of the same type asval. Its element valuesare those ofval except at positionidx, where it gets the valueelt. Ifidx exceeds the length ofval for a fixed-length vector,the result is apoison value. For a scalable vector,if the value ofidx exceeds the runtime length of the vector, the resultis apoison value.

Example:
<result> = insertelement <4 x i32> %vec, i32 1, i32 0    ; yields <4 x i32>

shufflevector’ Instruction

Syntax:
<result>=shufflevector<nx<ty>><v1>,<nx<ty>><v2>,<mxi32><mask>;yields<mx<ty>><result>=shufflevector<vscalexnx<ty>><v1>,<vscalexnx<ty>>v2,<vscalexmxi32><mask>;yields<vscalexmx<ty>>
Overview:

The ‘shufflevector’ instruction constructs a permutation of elementsfrom two input vectors, returning a vector with the same element type asthe input and length that is the same as the shuffle mask.

Arguments:

The first two operands of a ‘shufflevector’ instruction are vectorswith the same type. The third argument is a shuffle mask vector constantwhose element type isi32. The mask vector elements must be constantintegers orpoison values. The result of the instruction is a vectorwhose length is the same as the shuffle mask and whose element type is thesame as the element type of the first two operands.

Semantics:

The elements of the two input vectors are numbered from left to rightacross both of the vectors. For each element of the result vector, theshuffle mask selects an element from one of the input vectors to copyto the result. Non-negative elements in the mask represent an indexinto the concatenated pair of input vectors.

Apoison element in the mask vector specifies that the resulting elementispoison.For backwards-compatibility reasons, LLVM temporarily also acceptsundefmask elements, which will be interpreted the same way aspoison elements.If the shuffle mask selects anundef element from one of the inputvectors, the resulting element isundef.

For scalable vectors, the only valid mask values at present arezeroinitializer,undef andpoison, since we cannot write all indices asliterals for a vector with a length unknown at compile time.

Example:
<result> = shufflevector <4 x i32> %v1, <4 x i32> %v2,                        <4 x i32> <i32 0, i32 4, i32 1, i32 5>  ; yields <4 x i32><result> = shufflevector <4 x i32> %v1, <4 x i32> poison,                        <4 x i32> <i32 0, i32 1, i32 2, i32 3>  ; yields <4 x i32> - Identity shuffle.<result> = shufflevector <8 x i32> %v1, <8 x i32> poison,                        <4 x i32> <i32 0, i32 1, i32 2, i32 3>  ; yields <4 x i32><result> = shufflevector <4 x i32> %v1, <4 x i32> %v2,                        <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7 >  ; yields <8 x i32>

Aggregate Operations

LLVM supports several instructions for working withaggregate values.

extractvalue’ Instruction

Syntax:
<result>=extractvalue<aggregatetype><val>,<idx>{,<idx>}*
Overview:

The ‘extractvalue’ instruction extracts the value of a member fieldfrom anaggregate value.

Arguments:

The first operand of an ‘extractvalue’ instruction is a value ofstruct orarray type. The other operands areconstant indices to specify which value to extract in a similar manneras indices in a ‘getelementptr’ instruction.

The major differences togetelementptr indexing are:

  • Since the value being indexed is not a pointer, the first index isomitted and assumed to be zero.

  • At least one index must be specified.

  • Not only struct indices but also array indices must be in bounds.

Semantics:

The result is the value at the position in the aggregate specified bythe index operands.

Example:
<result> = extractvalue {i32, float} %agg, 0    ; yields i32

insertvalue’ Instruction

Syntax:
<result>=insertvalue<aggregatetype><val>,<ty><elt>,<idx>{,<idx>}*;yields<aggregatetype>
Overview:

The ‘insertvalue’ instruction inserts a value into a member field inanaggregate value.

Arguments:

The first operand of an ‘insertvalue’ instruction is a value ofstruct orarray type. The second operand isa first-class value to insert. The following operands are constantindices indicating the position at which to insert the value in asimilar manner as indices in a ‘extractvalue’ instruction. The valueto insert must have the same type as the value identified by theindices.

Semantics:

The result is an aggregate of the same type asval. Its value isthat ofval except that the value at the position specified by theindices is that ofelt.

Example:
%agg1=insertvalue{i32,float}poison,i321,0; yields {i32 1, float poison}%agg2=insertvalue{i32,float}%agg1,float%val,1; yields {i32 1, float %val}%agg3=insertvalue{i32,{float}}poison,float%val,1,0; yields {i32 poison, {float %val}}

Memory Access and Addressing Operations

A key design point of an SSA-based representation is how it representsmemory. In LLVM, no memory locations are in SSA form, which makes thingsvery simple. This section describes how to read, write, and allocatememory in LLVM.

alloca’ Instruction

Syntax:
<result>=alloca[inalloca]<type>[,<ty><NumElements>][,align<alignment>][,addrspace(<num>)];yieldstypeaddrspace(num)*:result
Overview:

The ‘alloca’ instruction allocates memory on the stack frame of thecurrently executing function, to be automatically released when thisfunction returns to its caller. If the address space is not explicitlyspecified, the default address space 0 is used.

Arguments:

The ‘alloca’ instruction allocatessizeof(<type>)*NumElementsbytes of memory on the runtime stack, returning a pointer of theappropriate type to the program. If “NumElements” is specified, it isthe number of elements allocated, otherwise “NumElements” is defaultedto be one.

If a constant alignment is specified, the value result of theallocation is guaranteed to be aligned to at least that boundary. Thealignment may not be greater than1<<32.

The alignment is only optional when parsing textual IR; for in-memory IR,it is always present. If not specified, the target can choose to align theallocation on any convenient boundary compatible with the type.

type’ may be any sized type.

Structs containing scalable vectors cannot be used in allocas unless allfields are the same scalable vector type (e.g.{<vscalex2xi32>,<vscalex2xi32>} contains the same type while{<vscalex2xi32>,<vscalex2xi64>} doesn’t).

Semantics:

Memory is allocated; a pointer is returned. The allocated memory isuninitialized, and loading from uninitialized memory produces an undefinedvalue. The operation itself is undefined if there is insufficient stackspace for the allocation.’alloca’d memory is automatically releasedwhen the function returns. The ‘alloca’ instruction is commonly usedto represent automatic variables that must have an address available. Whenthe function returns (either with theret orresume instructions),the memory is reclaimed. Allocating zero bytes is legal, but the returnedpointer may not be unique. The order in which memory is allocated (ie.,which way the stack grows) is not specified.

Note that ‘alloca’ outside of the alloca address space from thedatalayout string is meaningful only if thetarget has assigned it a semantics. For targets that specify a non-zero allocaaddress space in thedatalayout string, the allocaaddress space needs to be explicitly specified in the instruction if it is to beused.

If the returned pointer is used byllvm.lifetime.start,the returned object is initially dead.Seellvm.lifetime.start andllvm.lifetime.end for the precise semantics oflifetime-manipulating intrinsics.

Example:
%ptr=allocai32; yields ptr%ptr=allocai32,i324; yields ptr%ptr=allocai32,i324,align1024; yields ptr%ptr=allocai32,align1024; yields ptr

load’ Instruction

Syntax:
<result> = load [volatile] <ty>, ptr <pointer>[, align <alignment>][, !nontemporal !<nontemp_node>][, !invariant.load !<empty_node>][, !invariant.group !<empty_node>][, !nonnull !<empty_node>][, !dereferenceable !<deref_bytes_node>][, !dereferenceable_or_null !<deref_bytes_node>][, !align !<align_node>][, !noundef !<empty_node>]<result> = load atomic [volatile] <ty>, ptr <pointer> [syncscope("<target-scope>")] <ordering>, align <alignment> [, !invariant.group !<empty_node>]!<nontemp_node> = !{ i32 1 }!<empty_node> = !{}!<deref_bytes_node> = !{ i64 <dereferenceable_bytes> }!<align_node> = !{ i64 <value_alignment> }
Overview:

The ‘load’ instruction is used to read from memory.

Arguments:

The argument to theload instruction specifies the memory address from whichto load. The type specified must be afirst class type ofknown size (i.e. not containing anopaque structural type). Iftheload is marked asvolatile, then the optimizer is not allowed tomodify the number or order of execution of thisload with othervolatile operations.

If theload is marked asatomic, it takes an extraordering and optionalsyncscope("<target-scope>") argument. Therelease andacq_rel orderings are not valid onload instructions.Atomic loads producedefined results when they may seemultiple atomic stores. The type of the pointee must be an integer, pointer, orfloating-point type whose bit width is a power of two greater than or equal toeight.align must beexplicitly specified on atomic loads. Note: if the alignment is not greater orequal to the size of the<value> type, the atomic operation is likely torequire a lock and have poor performance.!nontemporal does not have anydefined semantics for atomic loads.

The optional constantalign argument specifies the alignment of theoperation (that is, the alignment of the memory address). It is theresponsibility of the code emitter to ensure that the alignment information iscorrect. Overestimating the alignment results in undefined behavior.Underestimating the alignment may produce less efficient code. An alignment of1 is always safe. The maximum possible alignment is1<<32. An alignmentvalue higher than the size of the loaded type implies memory up to thealignment value bytes can be safely loaded without trapping in the defaultaddress space. Access of the high bytes can interfere with debugging tools, soshould not be accessed if the function has thesanitize_thread orsanitize_address attributes.

The alignment is only optional when parsing textual IR; for in-memory IR, it isalways present. An omittedalign argument means that the operation has theABI alignment for the target.

The optional!nontemporal metadata must reference a singlemetadata name<nontemp_node> corresponding to a metadata node with onei32 entry of value 1. The existence of the!nontemporalmetadata on the instruction tells the optimizer and code generatorthat this load is not expected to be reused in the cache. The codegenerator may select special instructions to save cache bandwidth, suchas theMOVNT instruction on x86.

The optional!invariant.load metadata must reference a singlemetadata name<empty_node> corresponding to a metadata node with noentries. If a load instruction tagged with the!invariant.loadmetadata is executed, the memory location referenced by the load hasto contain the same value at all points in the program where thememory location is dereferenceable; otherwise, the behavior isundefined.

The optional!invariant.group metadata must reference a single metadata name

<empty_node> corresponding to a metadata node with no entries.Seeinvariant.group metadatainvariant.group.

The optional!nonnull metadata must reference a singlemetadata name<empty_node> corresponding to a metadata node with noentries. The existence of the!nonnull metadata on theinstruction tells the optimizer that the value loaded is known tonever be null. If the value is null at runtime, a poison value is returnedinstead. This is analogous to thenonnull attribute on parameters andreturn values. This metadata can only be applied to loads of a pointer type.

The optional!dereferenceable metadata must reference a single metadataname<deref_bytes_node> corresponding to a metadata node with onei64entry.Seedereferenceable metadatadereferenceable.

The optional!dereferenceable_or_null metadata must reference a singlemetadata name<deref_bytes_node> corresponding to a metadata node with onei64 entry.Seedereferenceable_or_null metadatadereferenceable_or_null.

The optional!align metadata must reference a single metadata name<align_node> corresponding to a metadata node with onei64 entry.The existence of the!align metadata on the instruction tells theoptimizer that the value loaded is known to be aligned to a boundary specifiedby the integer value in the metadata node. The alignment must be a power of 2.This is analogous to the ‘’align’’ attribute on parameters and return values.This metadata can only be applied to loads of a pointer type. If the returnedvalue is not appropriately aligned at runtime, a poison value is returnedinstead.

The optional!noundef metadata must reference a single metadata name<empty_node> corresponding to a node with no entries. The existence of!noundef metadata on the instruction tells the optimizer that the valueloaded is known to bewell defined.If the value isn’t well defined, the behavior is undefined. If the!noundefmetadata is combined with poison-generating metadata like!nonnull,violation of that metadata constraint will also result in undefined behavior.

Semantics:

The location of memory pointed to is loaded. If the value being loadedis of scalar type then the number of bytes read does not exceed theminimum number of bytes needed to hold all bits of the type. Forexample, loading ani24 reads at most three bytes. When loading avalue of a type likei20 with a size that is not an integral numberof bytes, the result is undefined if the value was not originallywritten using a store of the same type.If the value being loaded is of aggregate type, the bytes that correspond topadding may be accessed but are ignored, because it is impossible to observepadding from the loaded aggregate value.If<pointer> is not a well-defined value, the behavior is undefined.

Examples:
%ptr=allocai32; yields ptrstorei323,ptr%ptr; yields void%val=loadi32,ptr%ptr; yields i32:val = i32 3

store’ Instruction

Syntax:
store [volatile] <ty> <value>, ptr <pointer>[, align <alignment>][, !nontemporal !<nontemp_node>][, !invariant.group !<empty_node>]        ; yields voidstore atomic [volatile] <ty> <value>, ptr <pointer> [syncscope("<target-scope>")] <ordering>, align <alignment> [, !invariant.group !<empty_node>] ; yields void!<nontemp_node> = !{ i32 1 }!<empty_node> = !{}
Overview:

The ‘store’ instruction is used to write to memory.

Arguments:

There are two arguments to thestore instruction: a value to store and anaddress at which to store it. The type of the<pointer> operand must be apointer to thefirst class type of the<value>operand. If thestore is marked asvolatile, then the optimizer is notallowed to modify the number or order of execution of thisstore with othervolatile operations. Only values offirst class types of known size (i.e. not containing anopaquestructural type) can be stored.

If thestore is marked asatomic, it takes an extraordering and optionalsyncscope("<target-scope>") argument. Theacquire andacq_rel orderings aren’t valid onstore instructions.Atomic loads producedefined results when they may seemultiple atomic stores. The type of the pointee must be an integer, pointer, orfloating-point type whose bit width is a power of two greater than or equal toeight.align must beexplicitly specified on atomic stores. Note: if the alignment is not greater orequal to the size of the<value> type, the atomic operation is likely torequire a lock and have poor performance.!nontemporal does not have anydefined semantics for atomic stores.

The optional constantalign argument specifies the alignment of theoperation (that is, the alignment of the memory address). It is theresponsibility of the code emitter to ensure that the alignment information iscorrect. Overestimating the alignment results in undefined behavior.Underestimating the alignment may produce less efficient code. An alignment of1 is always safe. The maximum possible alignment is1<<32. An alignmentvalue higher than the size of the loaded type implies memory up to thealignment value bytes can be safely loaded without trapping in the defaultaddress space. Access of the high bytes can interfere with debugging tools, soshould not be accessed if the function has thesanitize_thread orsanitize_address attributes.

The alignment is only optional when parsing textual IR; for in-memory IR, it isalways present. An omittedalign argument means that the operation has theABI alignment for the target.

The optional!nontemporal metadata must reference a single metadataname<nontemp_node> corresponding to a metadata node with onei32 entryof value 1. The existence of the!nontemporal metadata on the instructiontells the optimizer and code generator that this load is not expected tobe reused in the cache. The code generator may select specialinstructions to save cache bandwidth, such as theMOVNT instruction onx86.

The optional!invariant.group metadata must reference asingle metadata name<empty_node>. Seeinvariant.group metadata.

Semantics:

The contents of memory are updated to contain<value> at thelocation specified by the<pointer> operand. If<value> isof scalar type then the number of bytes written does not exceed theminimum number of bytes needed to hold all bits of the type. Forexample, storing ani24 writes at most three bytes. When writing avalue of a type likei20 with a size that is not an integral numberof bytes, it is unspecified what happens to the extra bits that do notbelong to the type, but they will typically be overwritten.If<value> is of aggregate type, padding is filled withundef.If<pointer> is not a well-defined value, the behavior is undefined.

Example:
%ptr=allocai32; yields ptrstorei323,ptr%ptr; yields void%val=loadi32,ptr%ptr; yields i32:val = i32 3

fence’ Instruction

Syntax:
fence[syncscope("<target-scope>")]<ordering>;yieldsvoid
Overview:

The ‘fence’ instruction is used to introduce happens-before edgesbetween operations.

Arguments:

fence’ instructions take anordering argument whichdefines whatsynchronizes-with edges they add. They can only be givenacquire,release,acq_rel, andseq_cst orderings.

Semantics:

A fence A which has (at least)release ordering semanticssynchronizes with a fence B with (at least)acquire orderingsemantics if and only if there exist atomic operations X and Y, bothoperating on some atomic object M, such that A is sequenced before X, Xmodifies M (either directly or through some side effect of a sequenceheaded by X), Y is sequenced before B, and Y observes M. This provides ahappens-before dependency between A and B. Rather than an explicitfence, one (but not both) of the atomic operations X or Y mightprovide arelease oracquire (resp.) ordering constraint andstillsynchronize-with the explicitfence and establish thehappens-before edge.

Afence which hasseq_cst ordering, in addition to having bothacquire andrelease semantics specified above, participates inthe global program order of otherseq_cst operations and/orfences. Furthermore, the global ordering created by aseq_cstfence must be compatible with the individual total orders ofmonotonic (or stronger) memory accesses occurring before and aftersuch a fence. The exact semantics of this interaction are somewhatcomplicated, see the C++ standard’s[atomics.order] section for more details.

Afence instruction can also take an optional“syncscope” argument.

Example:
fence acquire                                        ; yields voidfence syncscope("singlethread") seq_cst              ; yields voidfence syncscope("agent") seq_cst                     ; yields void

cmpxchg’ Instruction

Syntax:
cmpxchg[weak][volatile]ptr<pointer>,<ty><cmp>,<ty><new>[syncscope("<target-scope>")]<successordering><failureordering>[,align<alignment>];yields{ty,i1}
Overview:

The ‘cmpxchg’ instruction is used to atomically modify memory. Itloads a value in memory and compares it to a given value. If they areequal, it tries to store a new value into the memory.

Arguments:

There are three arguments to the ‘cmpxchg’ instruction: an addressto operate on, a value to compare to the value currently be at thataddress, and a new value to place at that address if the compared valuesare equal. The type of ‘<cmp>’ must be an integer or pointer type whosebit width is a power of two greater than or equal to eight.‘<cmp>’ and ‘<new>’ musthave the same type, and the type of ‘<pointer>’ must be a pointer tothat type. If thecmpxchg is marked asvolatile, then theoptimizer is not allowed to modify the number or order of execution ofthiscmpxchg with othervolatile operations.

The success and failureordering arguments specify how thiscmpxchg synchronizes with other atomic operations. Both ordering parametersmust be at leastmonotonic, the failure ordering cannot be eitherrelease oracq_rel.

Acmpxchg instruction can also take an optional“syncscope” argument.

Note: if the alignment is not greater or equal to the size of the<value>type, the atomic operation is likely to require a lock and have poorperformance.

The alignment is only optional when parsing textual IR; for in-memory IR, it isalways present. If unspecified, the alignment is assumed to be equal to thesize of the ‘<value>’ type. Note that this default alignment assumption isdifferent from the alignment used for the load/store instructions when alignisn’t specified.

The pointer passed into cmpxchg must have alignment greater than orequal to the size in memory of the operand.

Semantics:

The contents of memory at the location specified by the ‘<pointer>’ operandis read and compared to ‘<cmp>’; if the values are equal, ‘<new>’ iswritten to the location. The original value at the location is returned,together with a flag indicating success (true) or failure (false).

If the cmpxchg operation is marked asweak then a spurious failure ispermitted: the operation may not write<new> even if the comparisonmatched.

If the cmpxchg operation is strong (the default), the i1 value is 1 if and onlyif the value loaded equalscmp.

A successfulcmpxchg is a read-modify-write instruction for the purpose ofidentifying release sequences. A failedcmpxchg is equivalent to an atomicload with an ordering parameter determined the second ordering parameter.

Example:
entry:%orig=loadatomici32,ptr%ptrunordered,align4; yields i32brlabel%looploop:%cmp=phii32[%orig,%entry],[%value_loaded,%loop]%squared=muli32%cmp,%cmp%val_success=cmpxchgptr%ptr,i32%cmp,i32%squaredacq_relmonotonic; yields  { i32, i1 }%value_loaded=extractvalue{i32,i1}%val_success,0%success=extractvalue{i32,i1}%val_success,1bri1%success,label%done,label%loopdone:...

atomicrmw’ Instruction

Syntax:
atomicrmw[volatile]<operation>ptr<pointer>,<ty><value>[syncscope("<target-scope>")]<ordering>[,align<alignment>];yieldsty
Overview:

The ‘atomicrmw’ instruction is used to atomically modify memory.

Arguments:

There are three arguments to the ‘atomicrmw’ instruction: anoperation to apply, an address whose value to modify, an argument to theoperation. The operation must be one of the following keywords:

  • xchg

  • add

  • sub

  • and

  • nand

  • or

  • xor

  • max

  • min

  • umax

  • umin

  • fadd

  • fsub

  • fmax

  • fmin

  • fmaximum

  • fminimum

  • uinc_wrap

  • udec_wrap

  • usub_cond

  • usub_sat

For most of these operations, the type of ‘<value>’ must be an integertype whose bit width is a power of two greater than or equal to eight.For xchg, thismay also be a floating point or a pointer type with the same size constraintsas integers. For fadd/fsub/fmax/fmin/fmaximum/fminimum, this must be a floating-pointor fixed vector of floating-point type. The type of the ‘<pointer>’operand must be a pointer to that type. If theatomicrmw is markedasvolatile, then the optimizer is not allowed to modify thenumber or order of execution of thisatomicrmw with othervolatile operations.

Note: if the alignment is not greater or equal to the size of the<value>type, the atomic operation is likely to require a lock and have poorperformance.

The alignment is only optional when parsing textual IR; for in-memory IR, it isalways present. If unspecified, the alignment is assumed to be equal to thesize of the ‘<value>’ type. Note that this default alignment assumption isdifferent from the alignment used for the load/store instructions when alignisn’t specified.

Aatomicrmw instruction can also take an optional“syncscope” argument.

Semantics:

The contents of memory at the location specified by the ‘<pointer>’operand are atomically read, modified, and written back. The originalvalue at the location is returned. The modification is specified by theoperation argument:

  • xchg:*ptr=val

  • add:*ptr=*ptr+val

  • sub:*ptr=*ptr-val

  • and:*ptr=*ptr&val

  • nand:*ptr=~(*ptr&val)

  • or:*ptr=*ptr|val

  • xor:*ptr=*ptr^val

  • max:*ptr=*ptr>val?*ptr:val (using a signed comparison)

  • min:*ptr=*ptr<val?*ptr:val (using a signed comparison)

  • umax:*ptr=*ptr>val?*ptr:val (using an unsigned comparison)

  • umin:*ptr=*ptr<val?*ptr:val (using an unsigned comparison)

  • fadd:*ptr=*ptr+val (using floating point arithmetic)

  • fsub:*ptr=*ptr-val (using floating point arithmetic)

  • fmax:*ptr=maxnum(*ptr,val) (match thellvm.maxnum.* intrinsic)

  • fmin:*ptr=minnum(*ptr,val) (match thellvm.minnum.* intrinsic)

  • fmaximum:*ptr=maximum(*ptr,val) (match thellvm.maximum.* intrinsic)

  • fminimum:*ptr=minimum(*ptr,val) (match thellvm.minimum.* intrinsic)

  • uinc_wrap:*ptr=(*ptru>=val)?0:(*ptr+1) (increment value with wraparound to zero when incremented above input value)

  • udec_wrap:*ptr=((*ptr==0)||(*ptru>val))?val:(*ptr-1) (decrement with wraparound to input value when decremented below zero).

  • usub_cond:*ptr=(*ptru>=val)?*ptr-val:*ptr (subtract only if no unsigned overflow).

  • usub_sat:*ptr=(*ptru>=val)?*ptr-val:0 (subtract with unsigned clamping to zero).

Example:
%old=atomicrmwaddptr%ptr,i321acquire; yields i32

getelementptr’ Instruction

Syntax:
<result>=getelementptr<ty>,ptr<ptrval>{,<ty><idx>}*<result>=getelementptrinbounds<ty>,ptr<ptrval>{,<ty><idx>}*<result>=getelementptrnusw<ty>,ptr<ptrval>{,<ty><idx>}*<result>=getelementptrnuw<ty>,ptr<ptrval>{,<ty><idx>}*<result>=getelementptrinrange(S,E)<ty>,ptr<ptrval>{,<ty><idx>}*<result>=getelementptr<ty>,<Nxptr><ptrval>,<vectorindextype><idx>
Overview:

The ‘getelementptr’ instruction is used to get the address of asubelement of anaggregate data structure. It performsaddress calculation only and does not access memory. The instruction can alsobe used to calculate a vector of such addresses.

Arguments:

The first argument is always a type used as the basis for the calculations.The second argument is always a pointer or a vector of pointers, and is thebase address to start from. The remaining arguments are indicesthat indicate which of the elements of the aggregate object are indexed.The interpretation of each index is dependent on the type being indexedinto. The first index always indexes the pointer value given as thesecond argument, the second index indexes a value of the type pointed to(not necessarily the value directly pointed to, since the first indexcan be non-zero), etc. The first type indexed into must be a pointervalue, subsequent types can be arrays, vectors, and structs. Note thatsubsequent types being indexed into can never be pointers, since thatwould require loading the pointer before continuing calculation.

The type of each index argument depends on the type it is indexing into.When indexing into a (optionally packed) structure, onlyi32 integerconstants are allowed (when using a vector of indices they must allbe thesamei32 integer constant). When indexing into an array,pointer or vector, integers of any width are allowed, and they are notrequired to be constant. These integers are treated as signed valueswhere relevant.

For example, let’s consider a C code fragment and how it gets compiledto LLVM:

structRT{charA;intB[10][20];charC;};structST{intX;doubleY;structRTZ;};int*foo(structST*s){return&s[1].Z.B[5][13];}

The LLVM code generated by Clang is approximately:

%struct.RT=type{i8,[10x[20xi32]],i8}%struct.ST=type{i32,double,%struct.RT}defineptr@foo(ptr%s){entry:%arrayidx=getelementptrinbounds%struct.ST,ptr%s,i641,i322,i321,i645,i6413retptr%arrayidx}
Semantics:

In the example above, the first index is indexing into the‘%struct.ST*’ type, which is a pointer, yielding a ‘%struct.ST’= ‘{i32,double,%struct.RT}’ type, a structure. The second indexindexes into the third element of the structure, yielding a‘%struct.RT’ = ‘{i8,[10x[20xi32]],i8}’ type, anotherstructure. The third index indexes into the second element of thestructure, yielding a ‘[10x[20xi32]]’ type, an array. The twodimensions of the array are subscripted into, yielding an ‘i32’type. The ‘getelementptr’ instruction returns a pointer to thiselement.

Note that it is perfectly legal to index partially through a structure,returning a pointer to an inner element. Because of this, the LLVM codefor the given testcase is equivalent to:

defineptr@foo(ptr%s){%t1=getelementptr%struct.ST,ptr%s,i321%t2=getelementptr%struct.ST,ptr%t1,i320,i322%t3=getelementptr%struct.RT,ptr%t2,i320,i321%t4=getelementptr[10x[20xi32]],ptr%t3,i320,i325%t5=getelementptr[20xi32],ptr%t4,i320,i3213retptr%t5}

The indices are first converted to offsets in the pointer’s index type. If thecurrently indexed type is a struct type, the struct offset corresponding to theindex is sign-extended or truncated to the pointer index type. Otherwise, theindex itself is sign-extended or truncated, and then multiplied by the typeallocation size (that is, the size rounded up to the ABI alignment) of thecurrently indexed type.

The offsets are then added to the low bits of the base address up to the indextype width, with silently-wrapping two’s complement arithmetic. If the pointersize is larger than the index size, this means that the bits outside the indextype width will not be affected.

The result value of thegetelementptr may be outside the object pointedto by the base pointer. The result value may not necessarily be used to accessmemory though, even if it happens to point into allocated storage. See thePointer Aliasing Rules section for moreinformation.

Thegetelementptr instruction may have a number of attributes that imposeadditional rules. If any of the rules are violated, the result value is apoison value. In cases where the base is a vector ofpointers, the attributes apply to each computation element-wise.

Fornusw (no unsigned signed wrap):

  • If the type of an index is larger than the pointer index type, thetruncation to the pointer index type preserves the signed value(truncnsw).

  • The multiplication of an index by the type size does not wrap the pointerindex type in a signed sense (mulnsw).

  • The successive addition of each offset (without adding the base address)does not wrap the pointer index type in a signed sense (addnsw).

  • The successive addition of the current address, truncated to the pointerindex type and interpreted as an unsigned number, and each offset,interpreted as a signed number, does not wrap the pointer index type.

Fornuw (no unsigned wrap):

  • If the type of an index is larger than the pointer index type, thetruncation to the pointer index type preserves the unsigned value(truncnuw).

  • The multiplication of an index by the type size does not wrap the pointerindex type in an unsigned sense (mulnuw).

  • The successive addition of each offset (without adding the base address)does not wrap the pointer index type in an unsigned sense (addnuw).

  • The successive addition of the current address, truncated to the pointerindex type and interpreted as an unsigned number, and each offset, alsointerpreted as an unsigned number, does not wrap the pointer index type(addnuw).

Forinbounds all rules of thenusw attribute apply. Additionally,if thegetelementptr has any non-zero indices, the following rules apply:

  • The base pointer has anin bounds address of theallocated object that it isbased on. This means that it points into thatallocated object, or to its end. Note that the object does not have to belive anymore; being in-bounds of a deallocated object is sufficient.

  • During the successive addition of offsets to the address, the resultingpointer must remainin bounds of the allocated object at each step.

Note thatgetelementptr with all-zero indices is always considered to beinbounds, even if the base pointer does not point to an allocated object.As a corollary, the only pointer in bounds of the null pointer in the defaultaddress space is the null pointer itself.

Ifinbounds is present on agetelementptr instruction, thenuswattribute will be automatically set as well. For this reason, thenuswwill also not be printed in textual IR ifinbounds is already present.

If theinrange(Start,End) attribute is present, loading from orstoring to any pointer derived from thegetelementptr has undefinedbehavior if the load or store would access memory outside the half-open range[Start,End) from thegetelementptr expression result. The result ofa pointer comparison orptrtoint (includingptrtoint-like operationsinvolving memory) involving a pointer derived from agetelementptr withtheinrange keyword is undefined, with the exception of comparisonsin the case where both operands are in the closed range[Start,End].Note that theinrange keyword is currently only allowedin constantgetelementptr expressions.

The getelementptr instruction is often confusing. For some more insightinto how it works, seethe getelementptr FAQ.

Example:
%aptr=getelementptr{i32,[12xi8]},ptr%saptr,i640,i321%vptr=getelementptr{i32,<2xi8>},ptr%svptr,i640,i321,i321%eptr=getelementptr[12xi8],ptr%aptr,i640,i321%iptr=getelementptr[10xi32],ptr@arr,i160,i160
Vector of pointers:

Thegetelementptr returns a vector of pointers, instead of a single address,when one or more of its arguments is a vector. In such cases, all vectorarguments should have the same number of elements, and every scalar argumentwill be effectively broadcast into a vector during address calculation.

; All arguments are vectors:;   A[i] = ptrs[i] + offsets[i]*sizeof(i8)%A=getelementptri8,<4xi8*>%ptrs,<4xi64>%offsets; Add the same scalar offset to each pointer of a vector:;   A[i] = ptrs[i] + offset*sizeof(i8)%A=getelementptri8,<4xptr>%ptrs,i64%offset; Add distinct offsets to the same pointer:;   A[i] = ptr + offsets[i]*sizeof(i8)%A=getelementptri8,ptr%ptr,<4xi64>%offsets; In all cases described above the type of the result is <4 x ptr>

The two following instructions are equivalent:

getelementptr%struct.ST,<4xptr>%s,<4xi64>%ind1,<4xi32><i322,i322,i322,i322>,<4xi32><i321,i321,i321,i321>,<4xi32>%ind4,<4xi64><i6413,i6413,i6413,i6413>getelementptr%struct.ST,<4xptr>%s,<4xi64>%ind1,i322,i321,<4xi32>%ind4,i6413

Let’s look at the C code, where the vector version ofgetelementptrmakes sense:

// Let's assume that we vectorize the following loop:double*A,*B;int*C;for(inti=0;i<size;++i){A[i]=B[C[i]];}
; get pointers for 8 elements from array B%ptrs=getelementptrdouble,ptr%B,<8xi32>%C; load 8 elements from array B into A%A=call<8xdouble>@llvm.masked.gather.v8f64.v8p0f64(<8xptr>%ptrs,i328,<8xi1>%mask,<8xdouble>%passthru)

Conversion Operations

The instructions in this category are the conversion instructions(casting) which all take a single operand and a type. They performvarious bit conversions on the operand.

trunc..to’ Instruction

Syntax:
<result>=trunc<ty><value>to<ty2>;yieldsty2<result>=truncnsw<ty><value>to<ty2>;yieldsty2<result>=truncnuw<ty><value>to<ty2>;yieldsty2<result>=truncnuwnsw<ty><value>to<ty2>;yieldsty2
Overview:

The ‘trunc’ instruction truncates its operand to the typety2.

Arguments:

The ‘trunc’ instruction takes a value to trunc, and a type to truncit to. Both types must be ofinteger types, or vectorsof the same number of integers. The bit size of thevalue must belarger than the bit size of the destination type,ty2. Equal sizedtypes are not allowed.

Semantics:

The ‘trunc’ instruction truncates the high order bits invalueand converts the remaining bits toty2. Since the source size mustbe larger than the destination size,trunc cannot be ano-op cast.It will always truncate bits.

If thenuw keyword is present, and any of the truncated bits are non-zero,the result is apoison value. If thensw keywordis present, and any of the truncated bits are not the same as the top bitof the truncation result, the result is apoison value.

Example:
%X=trunci32257toi8; yields i8:1%Y=trunci32123toi1; yields i1:true%Z=trunci32122toi1; yields i1:false%W=trunc<2xi16><i168,i167>to<2xi8>; yields <i8 8, i8 7>

zext..to’ Instruction

Syntax:
<result>=zext<ty><value>to<ty2>;yieldsty2
Overview:

The ‘zext’ instruction zero extends its operand to typety2.

Thenneg (non-negative) flag, if present, specifies that the operand isnon-negative. This property may be used by optimization passes to laterconvert thezext into asext.

Arguments:

The ‘zext’ instruction takes a value to cast, and a type to cast itto. Both types must be ofinteger types, or vectors ofthe same number of integers. The bit size of thevalue must besmaller than the bit size of the destination type,ty2.

Semantics:

Thezext fills the high order bits of thevalue with zero bitsuntil it reaches the size of the destination type,ty2.

When zero extending from i1, the result will always be either 0 or 1.

If thenneg flag is set, and thezext argument is negative, the resultis a poison value.

Example:
%X=zexti32257toi64; yields i64:257%Y=zexti1truetoi32; yields i32:1%Z=zext<2xi16><i168,i167>to<2xi32>; yields <i32 8, i32 7>%a=zextnnegi8127toi16; yields i16 127%b=zextnnegi8-1toi16; yields i16 poison

sext..to’ Instruction

Syntax:
<result>=sext<ty><value>to<ty2>;yieldsty2
Overview:

The ‘sext’ sign extendsvalue to the typety2.

Arguments:

The ‘sext’ instruction takes a value to cast, and a type to cast itto. Both types must be ofinteger types, or vectors ofthe same number of integers. The bit size of thevalue must besmaller than the bit size of the destination type,ty2.

Semantics:

The ‘sext’ instruction performs a sign extension by copying the signbit (highest order bit) of thevalue until it reaches the bit sizeof the typety2.

When sign extending from i1, the extension always results in -1 or 0.

Example:
%X=sexti8-1toi16; yields i16   :65535%Y=sexti1truetoi32; yields i32:-1%Z=sext<2xi16><i168,i167>to<2xi32>; yields <i32 8, i32 7>

fptrunc..to’ Instruction

Syntax:
<result>=fptrunc[fast-mathflags]*<ty><value>to<ty2>;yieldsty2
Overview:

The ‘fptrunc’ instruction truncatesvalue to typety2.

Arguments:

The ‘fptrunc’ instruction takes afloating-pointvalue to cast and afloating-point type to cast it to.The size ofvalue must be larger than the size ofty2. Thisimplies thatfptrunc cannot be used to make ano-op cast.

Semantics:

The ‘fptrunc’ instruction casts avalue from a largerfloating-point type to a smallerfloating-point type.This instruction is assumed to execute in the defaultfloating-pointenvironment.

NaN values follow the usualNaN behaviors, except that _if_ aNaN payload is propagated from the input (“Quieting NaN propagation” or“Unchanged NaN propagation” cases), then the low order bits of the NaN payloadwhich cannot fit in the resulting type are discarded. Note that if discardingthe low order bits leads to an all-0 payload, this cannot be represented as asignaling NaN (it would represent an infinity instead), so in that case“Unchanged NaN propagation” is not possible.

This instruction can also take any number offast-mathflags, which are optimization hints to enable otherwiseunsafe floating-point optimizations.

Example:
%X=fptruncdouble16777217.0tofloat; yields float:16777216.0%Y=fptruncdouble1.0E+300tohalf; yields half:+infinity

fpext..to’ Instruction

Syntax:
<result>=fpext[fast-mathflags]*<ty><value>to<ty2>;yieldsty2
Overview:

The ‘fpext’ extends a floating-pointvalue to a larger floating-pointvalue.

Arguments:

The ‘fpext’ instruction takes afloating-pointvalue to cast, and afloating-point type to cast itto. The source type must be smaller than the destination type.

Semantics:

The ‘fpext’ instruction extends thevalue from a smallerfloating-point type to a largerfloating-point type. Thefpext cannot be used to make ano-op cast because it always changes bits. Usebitcast to make ano-op cast for a floating-point cast.

NaN values follow the usualNaN behaviors, except that _if_ aNaN payload is propagated from the input (“Quieting NaN propagation” or“Unchanged NaN propagation” cases), then it is copied to the high order bits ofthe resulting payload, and the remaining low order bits are zero.

This instruction can also take any number offast-mathflags, which are optimization hints to enable otherwiseunsafe floating-point optimizations.

Example:
%X=fpextfloat3.125todouble; yields double:3.125000e+00%Y=fpextdouble%Xtofp128; yields fp128:0xL00000000000000004000900000000000

fptoui..to’ Instruction

Syntax:
<result>=fptoui<ty><value>to<ty2>;yieldsty2
Overview:

The ‘fptoui’ converts a floating-pointvalue to its unsignedinteger equivalent of typety2.

Arguments:

The ‘fptoui’ instruction takes a value to cast, which must be ascalar or vectorfloating-point value, and a type tocast it toty2, which must be aninteger type. Ifty is a vector floating-point type,ty2 must be a vector integertype with the same number of elements asty

Semantics:

The ‘fptoui’ instruction converts itsfloating-point operand into the nearest (rounding towards zero)unsigned integer value. If the value cannot fit inty2, the resultis apoison value.

Example:
%X=fptouidouble123.0toi32; yields i32:123%Y=fptouifloat1.0E+300toi1; yields undefined:1%Z=fptouifloat1.04E+17toi8; yields undefined:1

fptosi..to’ Instruction

Syntax:
<result>=fptosi<ty><value>to<ty2>;yieldsty2
Overview:

The ‘fptosi’ instruction convertsfloating-pointvalue to typety2.

Arguments:

The ‘fptosi’ instruction takes a value to cast, which must be ascalar or vectorfloating-point value, and a type tocast it toty2, which must be aninteger type. Ifty is a vector floating-point type,ty2 must be a vector integertype with the same number of elements asty

Semantics:

The ‘fptosi’ instruction converts itsfloating-point operand into the nearest (rounding towards zero)signed integer value. If the value cannot fit inty2, the resultis apoison value.

Example:
%X=fptosidouble-123.0toi32; yields i32:-123%Y=fptosifloat1.0E-247toi1; yields undefined:1%Z=fptosifloat1.04E+17toi8; yields undefined:1

uitofp..to’ Instruction

Syntax:
<result>=uitofp<ty><value>to<ty2>;yieldsty2
Overview:

The ‘uitofp’ instruction regardsvalue as an unsigned integerand converts that value to thety2 type.

Thenneg (non-negative) flag, if present, specifies that theoperand is non-negative. This property may be used by optimizationpasses to later convert theuitofp into asitofp.

Arguments:

The ‘uitofp’ instruction takes a value to cast, which must be ascalar or vectorinteger value, and a type to cast it toty2, which must be anfloating-point type. Ifty is a vector integer type,ty2 must be a vector floating-pointtype with the same number of elements asty

Semantics:

The ‘uitofp’ instruction interprets its operand as an unsignedinteger quantity and converts it to the corresponding floating-pointvalue. If the value cannot be exactly represented, it is rounded usingthe default rounding mode.

If thenneg flag is set, and theuitofp argument is negative,the result is a poison value.

Example:
%X=uitofpi32257tofloat; yields float:257.0%Y=uitofpi8-1todouble; yields double:255.0%a=uitofpnnegi32256toi32; yields float:256.0%b=uitofpnnegi32-256toi32; yields i32 poison

sitofp..to’ Instruction

Syntax:
<result>=sitofp<ty><value>to<ty2>;yieldsty2
Overview:

The ‘sitofp’ instruction regardsvalue as a signed integer andconverts that value to thety2 type.

Arguments:

The ‘sitofp’ instruction takes a value to cast, which must be ascalar or vectorinteger value, and a type to cast it toty2, which must be anfloating-point type. Ifty is a vector integer type,ty2 must be a vector floating-pointtype with the same number of elements asty

Semantics:

The ‘sitofp’ instruction interprets its operand as a signed integerquantity and converts it to the corresponding floating-point value. If thevalue cannot be exactly represented, it is rounded using the default roundingmode.

Example:
%X=sitofpi32257tofloat; yields float:257.0%Y=sitofpi8-1todouble; yields double:-1.0

ptrtoint..to’ Instruction

Syntax:
<result>=ptrtoint<ty><value>to<ty2>;yieldsty2
Overview:

The ‘ptrtoint’ instruction converts the pointer or a vector ofpointersvalue to the integer (or vector of integers) typety2.

Arguments:

The ‘ptrtoint’ instruction takes avalue to cast, which must bea value of typepointer or a vector of pointers, and atype to cast it toty2, which must be aninteger ora vector of integers type.

Semantics:

The ‘ptrtoint’ instruction convertsvalue to integer typety2 by interpreting the all pointer representation bits as an integer(equivalent to abitcast) and either truncating or zero extending that valueto the size of the integer type.Ifvalue is smaller thanty2 then a zero extension is done. Ifvalue is larger thanty2 then a truncation is done. If they arethe same size, then nothing is done (no-op cast) other than a typechange.Theptrtoint alwayscaptures address and provenanceof the pointer argument.

Example:
%X=ptrtointptr%Ptoi8; yields truncation on 32-bit architecture%Y=ptrtointptr%Ptoi64; yields zero extension on 32-bit architecture%Z=ptrtoint<4xptr>%Pto<4xi64>; yields vector zero extension for a vector of addresses on 32-bit architecture

inttoptr..to’ Instruction

Syntax:
<result> = inttoptr <ty> <value> to <ty2>[, !dereferenceable !<deref_bytes_node>][, !dereferenceable_or_null !<deref_bytes_node>]             ; yields ty2
Overview:

The ‘inttoptr’ instruction converts an integervalue to apointer type,ty2.

Arguments:

The ‘inttoptr’ instruction takes aninteger value tocast, and a type to cast it to, which must be apointertype.

The optional!dereferenceable metadata must reference a single metadataname<deref_bytes_node> corresponding to a metadata node with onei64entry.Seedereferenceable metadata.

The optional!dereferenceable_or_null metadata must reference a singlemetadata name<deref_bytes_node> corresponding to a metadata node with onei64 entry.Seedereferenceable_or_null metadata.

Semantics:

The ‘inttoptr’ instruction convertsvalue to typety2 byapplying either a zero extension or a truncation depending on the sizeof the integervalue. Ifvalue is larger than the size of apointer then a truncation is done. Ifvalue is smaller than the sizeof a pointer then a zero extension is done. If they are the same size,nothing is done (no-op cast).The behavior is equivalent to abitcast, however, the resulting value is notguaranteed to be dereferenceable (e.g. if the result type is anon-integral pointers).

Example:
%X=inttoptri32255toptr; yields zero extension on 64-bit architecture%Y=inttoptri32255toptr; yields no-op on 32-bit architecture%Z=inttoptri640toptr; yields truncation on 32-bit architecture%Z=inttoptr<4xi32>%Gto<4xptr>; yields truncation of vector G to four pointers

bitcast..to’ Instruction

Syntax:
<result>=bitcast<ty><value>to<ty2>;yieldsty2
Overview:

The ‘bitcast’ instruction convertsvalue to typety2 withoutchanging any bits.

Arguments:

The ‘bitcast’ instruction takes a value to cast, which must be anon-aggregate first class value, and a type to cast it to, which mustalso be a non-aggregatefirst class type. Thebit sizes ofvalue and the destination type,ty2, must beidentical. If the source type is a pointer, the destination type mustalso be a pointer of the same size. This instruction supports bitwiseconversion of vectors to integers and to vectors of other types (aslong as they have the same size).

Semantics:

The ‘bitcast’ instruction convertsvalue to typety2. Itis always ano-op cast because no bits change with thisconversion. The conversion is done as if thevalue had been storedto memory and read back as typety2. Pointer (or vector ofpointers) types may only be converted to other pointer (or vector ofpointers) types with the same address space through this instruction.To convert pointers to other types, use theinttoptrorptrtoint instructions first.

There is a caveat for bitcasts involving vector types in relation toendianness. For examplebitcast<2xi8><value>toi16 puts element zeroof the vector in the least significant bits of the i16 for little-endian whileelement zero ends up in the most significant bits for big-endian.

Example:
%X = bitcast i8 255 to i8         ; yields i8 :-1%Y = bitcast i32* %x to i16*      ; yields i16*:%x%Z = bitcast <2 x i32> %V to i64; ; yields i64: %V (depends on endianness)%Z = bitcast <2 x i32*> %V to <2 x i64*> ; yields <2 x i64*>

addrspacecast..to’ Instruction

Syntax:
<result>=addrspacecast<pty><ptrval>to<pty2>;yieldspty2
Overview:

The ‘addrspacecast’ instruction convertsptrval frompty inaddress spacen to typepty2 in address spacem.

Arguments:

The ‘addrspacecast’ instruction takes a pointer or vector of pointer valueto cast and a pointer type to cast it to, which must have a differentaddress space.

Semantics:

The ‘addrspacecast’ instruction converts the pointer valueptrval to typepty2. It can be ano-op cast or a complexvalue modification, depending on the target and the address spacepair. Pointer conversions within the same address space must beperformed with thebitcast instruction. Note that if the addressspace conversion produces a dereferenceable result then both resultand operand refer to the same memory location. The conversion musthave no side effects, and must not capture the value of the pointer.

If the source ispoison, the result ispoison.

If the source is notpoison, and both source anddestination areintegral pointers, and theresult pointer is dereferenceable, the cast is assumed to bereversible (i.e. casting the result back to the original address spaceshould yield the original bit pattern).

Which address space casts are supported depends on the target. Unsupportedaddress space casts returnpoison.

Example:
%X=addrspacecastptr%xtoptraddrspace(1)%Y=addrspacecastptraddrspace(1)%ytoptraddrspace(2)%Z=addrspacecast<4xptr>%zto<4xptraddrspace(3)>

Other Operations

The instructions in this category are the “miscellaneous” instructions,which defy better classification.

icmp’ Instruction

Syntax:
<result>=icmp<cond><ty><op1>,<op2>;yieldsi1or<Nxi1>:result<result>=icmpsamesign<cond><ty><op1>,<op2>;yieldsi1or<Nxi1>:result
Overview:

The ‘icmp’ instruction returns a boolean value or a vector ofboolean values based on comparison of its two integer, integer vector,pointer, or pointer vector operands.

Arguments:

The ‘icmp’ instruction takes three operands. The first operand isthe condition code indicating the kind of comparison to perform. It isnot a value, just a keyword. The possible condition codes are:

  1. eq: equal

  2. ne: not equal

  3. ugt: unsigned greater than

  4. uge: unsigned greater or equal

  5. ult: unsigned less than

  6. ule: unsigned less or equal

  7. sgt: signed greater than

  8. sge: signed greater or equal

  9. slt: signed less than

  10. sle: signed less or equal

The remaining two arguments must beinteger orpointer or integervector typed. Theymust also be identical types.

Semantics:

The ‘icmp’ comparesop1 andop2 according to the conditioncode given ascond. The comparison performed always yields either ani1 or vector ofi1 result, as follows:

  1. eq: yieldstrue if the operands are equal,falseotherwise. No sign interpretation is necessary or performed.

  2. ne: yieldstrue if the operands are unequal,falseotherwise. No sign interpretation is necessary or performed.

  3. ugt: interprets the operands as unsigned values and yieldstrue ifop1 is greater thanop2.

  4. uge: interprets the operands as unsigned values and yieldstrue ifop1 is greater than or equal toop2.

  5. ult: interprets the operands as unsigned values and yieldstrue ifop1 is less thanop2.

  6. ule: interprets the operands as unsigned values and yieldstrue ifop1 is less than or equal toop2.

  7. sgt: interprets the operands as signed values and yieldstrueifop1 is greater thanop2.

  8. sge: interprets the operands as signed values and yieldstrueifop1 is greater than or equal toop2.

  9. slt: interprets the operands as signed values and yieldstrueifop1 is less thanop2.

  10. sle: interprets the operands as signed values and yieldstrueifop1 is less than or equal toop2.

If the operands arepointer typed, the pointer valuesare compared as if they were integers.

If the operands are integer vectors, then they are compared element byelement. The result is ani1 vector with the same number of elementsas the values being compared. Otherwise, the result is ani1.

If thesamesign keyword is present and the operands are not of thesame sign then the result is apoison value.

Example:
<result> = icmp eq i32 4, 5          ; yields: result=false<result> = icmp ne ptr %X, %X        ; yields: result=false<result> = icmp ult i16  4, 5        ; yields: result=true<result> = icmp sgt i16  4, 5        ; yields: result=false<result> = icmp ule i16 -4, 5        ; yields: result=false<result> = icmp sge i16  4, 5        ; yields: result=false

fcmp’ Instruction

Syntax:
<result>=fcmp[fast-mathflags]*<cond><ty><op1>,<op2>;yieldsi1or<Nxi1>:result
Overview:

The ‘fcmp’ instruction returns a boolean value or vector of booleanvalues based on comparison of its operands.

If the operands are floating-point scalars, then the result type is aboolean (i1).

If the operands are floating-point vectors, then the result type is avector of boolean with the same number of elements as the operands beingcompared.

Arguments:

The ‘fcmp’ instruction takes three operands. The first operand isthe condition code indicating the kind of comparison to perform. It isnot a value, just a keyword. The possible condition codes are:

  1. false: no comparison, always returns false

  2. oeq: ordered and equal

  3. ogt: ordered and greater than

  4. oge: ordered and greater than or equal

  5. olt: ordered and less than

  6. ole: ordered and less than or equal

  7. one: ordered and not equal

  8. ord: ordered (no nans)

  9. ueq: unordered or equal

  10. ugt: unordered or greater than

  11. uge: unordered or greater than or equal

  12. ult: unordered or less than

  13. ule: unordered or less than or equal

  14. une: unordered or not equal

  15. uno: unordered (either nans)

  16. true: no comparison, always returns true

Ordered means that neither operand is a QNAN whileunordered meansthat either operand may be a QNAN.

Each ofval1 andval2 arguments must be either afloating-point type or avector of floating-point type.They must have identical types.

Semantics:

The ‘fcmp’ instruction comparesop1 andop2 according to thecondition code given ascond. If the operands are vectors, then thevectors are compared element by element. Each comparison performedalways yields ani1 result, as follows:

  1. false: always yieldsfalse, regardless of operands.

  2. oeq: yieldstrue if both operands are not a QNAN andop1is equal toop2.

  3. ogt: yieldstrue if both operands are not a QNAN andop1is greater thanop2.

  4. oge: yieldstrue if both operands are not a QNAN andop1is greater than or equal toop2.

  5. olt: yieldstrue if both operands are not a QNAN andop1is less thanop2.

  6. ole: yieldstrue if both operands are not a QNAN andop1is less than or equal toop2.

  7. one: yieldstrue if both operands are not a QNAN andop1is not equal toop2.

  8. ord: yieldstrue if both operands are not a QNAN.

  9. ueq: yieldstrue if either operand is a QNAN orop1 isequal toop2.

  10. ugt: yieldstrue if either operand is a QNAN orop1 isgreater thanop2.

  11. uge: yieldstrue if either operand is a QNAN orop1 isgreater than or equal toop2.

  12. ult: yieldstrue if either operand is a QNAN orop1 isless thanop2.

  13. ule: yieldstrue if either operand is a QNAN orop1 isless than or equal toop2.

  14. une: yieldstrue if either operand is a QNAN orop1 isnot equal toop2.

  15. uno: yieldstrue if either operand is a QNAN.

  16. true: always yieldstrue, regardless of operands.

Thefcmp instruction can also optionally take any number offast-math flags, which are optimization hints to enableotherwise unsafe floating-point optimizations.

Any set of fast-math flags are legal on anfcmp instruction, but theonly flags that have any effect on its semantics are those that allowassumptions to be made about the values of input arguments; namelynnan,ninf, andreassoc. SeeFast-Math Flags for more information.

Example:
<result> = fcmp oeq float 4.0, 5.0    ; yields: result=false<result> = fcmp one float 4.0, 5.0    ; yields: result=true<result> = fcmp olt float 4.0, 5.0    ; yields: result=true<result> = fcmp ueq double 1.0, 2.0   ; yields: result=false

phi’ Instruction

Syntax:
<result>=phi[fast-math-flags]<ty>[<val0>,<label0>],...
Overview:

The ‘phi’ instruction is used to implement the φ node in the SSAgraph representing the function.

Arguments:

The type of the incoming values is specified with the first type field.After this, the ‘phi’ instruction takes a list of pairs asarguments, with one pair for each predecessor basic block of the currentblock. Only values offirst class type may be used asthe value arguments to the PHI node. Only labels may be used as thelabel arguments.

There must be no non-phi instructions between the start of a basic blockand the PHI instructions: i.e. PHI instructions must be first in a basicblock.

For the purposes of the SSA form, the use of each incoming value isdeemed to occur on the edge from the corresponding predecessor block tothe current block (but after any definition of an ‘invoke’instruction’s return value on the same edge).

The optionalfast-math-flags marker indicates that the phi has oneor morefast-math-flags. These are optimization hintsto enable otherwise unsafe floating-point optimizations. Fast-math-flagsare only valid for phis that returnsupported floating-point types.

Semantics:

At runtime, the ‘phi’ instruction logically takes on the valuespecified by the pair corresponding to the predecessor basic block thatexecuted just prior to the current block.

Example:
Loop:; Infinite loop that counts from 0 on up...%indvar=phii32[0,%LoopHeader],[%nextindvar,%Loop]%nextindvar=addi32%indvar,1brlabel%Loop

select’ Instruction

Syntax:
<result>=select[fast-mathflags]selty<cond>,<ty><val1>,<ty><val2>;yieldstyseltyiseitheri1or{<Nxi1>}
Overview:

The ‘select’ instruction is used to choose one value based on acondition, without IR-level branching.

Arguments:

The ‘select’ instruction requires an ‘i1’ value or a vector of ‘i1’values indicating the condition, and two values of the samefirstclass type.

  1. The optionalfast-mathflags marker indicates that the select has one or morefast-math flags. These are optimization hints to enableotherwise unsafe floating-point optimizations. Fast-math flags are only validfor selects that returnsupported floating-point types. Note that the presence of value which would otherwise resultin poison does not cause the result to be poison if the value is on the non-selected arm.Iffast-math flags are present, they are only applied to the result,not both arms.

Semantics:

If the condition is an i1 and it evaluates to 1, the instruction returnsthe first value argument; otherwise, it returns the second valueargument.

If the condition is a vector of i1, then the value arguments must bevectors of the same size, and the selection is done element by element.

If the condition is an i1 and the value arguments are vectors of thesame size, then an entire vector is selected.

Example:
%X=selecti1true,i817,i842; yields i8:17%Y=selectnnani1true,float0.0,floatNaN; yields float:0.0%Z=selectnnani1false,float0.0,floatNaN; yields float:poison

freeze’ Instruction

Syntax:
<result>=freezety<val>;yieldsty:result
Overview:

The ‘freeze’ instruction is used to stop propagation ofundef andpoison values.

Arguments:

The ‘freeze’ instruction takes a single argument.

Semantics:

If the argument isundef orpoison, ‘freeze’ returns anarbitrary, but fixed, value of type ‘ty’.Otherwise, this instruction is a no-op and returns the input argument.All uses of a value returned by the same ‘freeze’ instruction areguaranteed to always observe the same value, while different ‘freeze’instructions may yield different values.

Whileundef andpoison pointers can be frozen, the result is anon-dereferenceable pointer. See thePointer Aliasing Rules section for more information.If an aggregate value or vector is frozen, the operand is frozen element-wise.The padding of an aggregate isn’t considered, since it isn’t visiblewithout storing it into memory and loading it with a different type.

Example:
%w = i32 undef%x = freeze i32 %w%y = add i32 %w, %w         ; undef%z = add i32 %x, %x         ; even number because all uses of %x observe                            ; the same value%x2 = freeze i32 %w%cmp = icmp eq i32 %x, %x2  ; can be true or false; example with vectors%v = <2 x i32> <i32 undef, i32 poison>%a = extractelement <2 x i32> %v, i32 0    ; undef%b = extractelement <2 x i32> %v, i32 1    ; poison%add = add i32 %a, %a                      ; undef%v.fr = freeze <2 x i32> %v                ; element-wise freeze%d = extractelement <2 x i32> %v.fr, i32 0 ; not undef%add.f = add i32 %d, %d                    ; even number; branching on frozen value%poison = add nsw i1 %k, undef   ; poison%c = freeze i1 %poisonbr i1 %c, label %foo, label %bar ; non-deterministic branch to %foo or %bar

call’ Instruction

Syntax:
<result>=[tail|musttail|notail]call[fast-mathflags][cconv][retattrs][addrspace(<num>)]<ty>|<fnty><fnptrval>(<functionargs>)[fnattrs][operandbundles]
Overview:

The ‘call’ instruction represents a simple function call.

Arguments:

This instruction requires several arguments:

  1. The optionaltail andmusttail markers indicate that the optimizersshould perform tail call optimization. Thetail marker is a hint thatcan be ignored. Themusttail marker means that the call must be tail call optimized in orderfor the program to be correct. This is true even in the presence ofattributes like “disable-tail-calls”. Themusttail marker provides theseguarantees:

    • The call will not cause unbounded stack growth if it is part of arecursive cycle in the call graph.

    • Arguments with theinalloca orpreallocated attribute are forwarded in place.

    • If the musttail call appears in a function with the"thunk" attributeand the caller and callee both have varargs, then any unprototypedarguments in register or memory are forwarded to the callee. Similarly,the return value of the callee is returned to the caller’s caller, evenif a void return type is in use.

    Both markers imply that the callee does not access allocas, va_args, orbyval arguments from the caller. As an exception to that, an alloca or byvalargument may be passed to the callee as a byval argument, which can bedereferenced inside the callee. For example:

    declarevoid@take_byval(ptrbyval(i64))declarevoid@take_ptr(ptr); Invalid (assuming @take_ptr dereferences the pointer), because %local; may be de-allocated before the call to @take_ptr.definevoid@invalid_alloca(){entry:%local=allocai64tailcallvoid@take_ptr(ptr%local)retvoid}; Valid, the byval attribute causes the memory allocated by %local to be; copied into @take_byval's stack frame.definevoid@byval_alloca(){entry:%local=allocai64tailcallvoid@take_byval(ptrbyval(i64)%local)retvoid}; Invalid, because @use_global_va_list uses the variadic arguments from; @invalid_va_list.%struct.va_list=type{ptr}@va_list=externalglobal%struct.va_listdefinevoid@use_global_va_list(){entry:%arg=va_argptr@va_list,i64retvoid}definevoid@invalid_va_list(i32%a,...){entry:callvoid@llvm.va_start.p0(ptr@va_list)tailcallvoid@use_global_va_list()retvoid}; Valid, byval argument forwarded to tail call as another byval argument.definevoid@forward_byval(ptrbyval(i64)%x){entry:tailcallvoid@take_byval(ptrbyval(i64)%x)retvoid}; Invalid (assuming @take_ptr dereferences the pointer), byval argument; passed to tail callee as non-byval ptr.definevoid@invalid_byval(ptrbyval(i64)%x){entry:tailcallvoid@take_ptr(ptr%x)retvoid}

    Calls markedmusttail must obey the following additional rules:

    • The call must immediately precede aret instruction,or a pointer bitcast followed by a ret instruction.

    • The ret instruction must return the (possibly bitcasted) valueproduced by the call, undef, or void.

    • The calling conventions of the caller and callee must match.

    • The callee must be varargs iff the caller is varargs. Bitcasting anon-varargs function to the appropriate varargs type is legal solong as the non-varargs prefixes obey the other rules.

    • The return type must not undergo automatic conversion to ansret pointer.

    In addition, if the calling convention is notswifttailcc ortailcc:

    • All ABI-impacting function attributes, such as sret, byval, inreg,returned, and inalloca, must match.

    • The caller and callee prototypes must match. Pointer types of parametersor return types do not differ in address space.

    On the other hand, if the calling convention isswifttailcc ortailcc:

    • Only these ABI-impacting attributes attributes are allowed: sret, byval,swiftself, and swiftasync.

    • Prototypes are not required to match.

    Tail call optimization for calls markedtail is guaranteed to occur ifthe following conditions are met:

    • Caller and callee both have the calling conventionfastcc ortailcc.

    • The call is in tail position (ret immediately follows call and retuses value of call or is void).

    • Option-tailcallopt is enabled,llvm::GuaranteedTailCallOpt istrue, or the calling convention istailcc.

    • Platform-specific constraints are met.

  2. The optionalnotail marker indicates that the optimizers should not addtail ormusttail markers to the call. It is used to prevent tailcall optimization from being performed on the call.

  3. The optionalfast-mathflags marker indicates that the call has one or morefast-math flags, which are optimization hints to enableotherwise unsafe floating-point optimizations. Fast-math flags are only validfor calls that returnsupported floating-point types.

  4. The optional “cconv” marker indicates whichcallingconvention the call should use. If none isspecified, the call defaults to using C calling conventions. Thecalling convention of the call must match the calling convention ofthe target function, or else the behavior is undefined.

  5. The optionalParameter Attributes list for returnvalues. Only ‘zeroext’, ‘signext’, ‘noext’, and ‘inreg’attributes are valid here.

  6. The optional addrspace attribute can be used to indicate the address spaceof the called function. If it is not specified, the program address spacefrom thedatalayout string will be used.

  7. ty’: the type of the call instruction itself which is also thetype of the return value. Functions that return no value are markedvoid. The signature is computed based on the return type and argumenttypes.

  8. fnty’: shall be the signature of the function being called. Theargument types must match the types implied by this signature. Thisis only required if the signature specifies a varargs type.

  9. fnptrval’: An LLVM value containing a pointer to a function tobe called. In most cases, this is a direct function call, butindirectcall’s are just as possible, calling an arbitrary pointerto function value.

  10. functionargs’: argument list whose types match the functionsignature argument types and parameter attributes. All arguments mustbe offirst class type. If the function signatureindicates the function accepts a variable number of arguments, theextra arguments can be specified.

  11. The optionalfunction attributes list.

  12. The optionaloperand bundles list.

Semantics:

The ‘call’ instruction is used to cause control flow to transfer toa specified function, with its incoming arguments bound to the specifiedvalues. Upon a ‘ret’ instruction in the called function, controlflow continues with the instruction after the function call, and thereturn value of the function is bound to the result argument.

If the callee refers to an intrinsic function, the signature of the call mustmatch the signature of the callee. Otherwise, if the signature of the calldoes not match the signature of the called function, the behavior istarget-specific. For a significant mismatch, this likely results in undefinedbehavior. LLVM interprocedural optimizations generally only optimize callswhere the signature of the caller matches the signature of the callee.

Note that it is possible for the signatures to mismatch even if a call appearsto be a “direct” call, likecallvoid@f().

Example:
%retval=calli32@test(i32%argc)calli32(ptr,...)@printf(ptr%msg,i3212,i842); yields i32%X=tailcalli32@foo(); yields i32%Y=tailcallfastcci32@foo(); yields i32callvoid%foo(i8signext97)%struct.A=type{i32,i8}%r=call%struct.A@foo(); yields { i32, i8 }%gr=extractvalue%struct.A%r,0; yields i32%gr1=extractvalue%struct.A%r,1; yields i8%Z=callvoid@foo()noreturn; indicates that %foo never returns normally%ZZ=callzeroexti32@bar(); Return value is %zero extended

llvm treats calls to some functions with names and arguments that matchthe standard C99 library as being the C99 library functions, and mayperform optimizations or generate code for them under that assumption.This is something we’d like to change in the future to provide bettersupport for freestanding environments and non-C-based languages.

va_arg’ Instruction

Syntax:
<resultval>=va_arg<va_list*><arglist>,<argty>
Overview:

The ‘va_arg’ instruction is used to access arguments passed throughthe “variable argument” area of a function call. It is used to implementtheva_arg macro in C.

Arguments:

This instruction takes ava_list* value and the type of theargument. It returns a value of the specified argument type andincrements theva_list to point to the next argument. The actualtype ofva_list is target specific.

Semantics:

The ‘va_arg’ instruction loads an argument of the specified typefrom the specifiedva_list and causes theva_list to point tothe next argument. For more information, see the variable argumenthandlingIntrinsic Functions.

It is legal for this instruction to be called in a function which doesnot take a variable number of arguments, for example, thevfprintffunction.

va_arg is an LLVM instruction instead of anintrinsicfunction because it takes a type as an argument.

Example:

See thevariable argument processing section.

Note that the code generator does not yet fully support va_arg on manytargets. Also, it does not currently support va_arg with aggregatetypes on any target.

landingpad’ Instruction

Syntax:
<resultval>=landingpad<resultty><clause>+<resultval>=landingpad<resultty>cleanup<clause>*<clause>:=catch<type><value><clause>:=filter<arrayconstanttype><arrayconstant>
Overview:

The ‘landingpad’ instruction is used byLLVM’s exception handlingsystem to specify that a basic blockis a landing pad — one where the exception lands, and corresponds to thecode found in thecatch portion of atry/catch sequence. Itdefines values supplied by thepersonality function uponre-entry to the function. Theresultval has the typeresultty.

Arguments:

The optionalcleanup flag indicates that the landing pad block is a cleanup.

Aclause begins with the clause type —catch orfilter — andcontains the global variable representing the “type” that may be caughtor filtered respectively. Unlike thecatch clause, thefilterclause takes an array constant as its argument. Use“[0xptr]undef” for a filter which cannot throw. The‘landingpad’ instruction must containat least oneclause orthecleanup flag.

Semantics:

The ‘landingpad’ instruction defines the values which are set by thepersonality function upon re-entry to the function, andtherefore the “result type” of thelandingpad instruction. As withcalling conventions, how the personality function results arerepresented in LLVM IR is target specific.

The clauses are applied in order from top to bottom. If twolandingpad instructions are merged together through inlining, theclauses from the calling function are appended to the list of clauses.When the call stack is being unwound due to an exception being thrown,the exception is compared against eachclause in turn. If it doesn’tmatch any of the clauses, and thecleanup flag is not set, thenunwinding continues further up the call stack.

Thelandingpad instruction has several restrictions:

  • A landing pad block is a basic block which is the unwind destinationof an ‘invoke’ instruction.

  • A landing pad block must have a ‘landingpad’ instruction as itsfirst non-PHI instruction.

  • There can be only one ‘landingpad’ instruction within the landingpad block.

  • A basic block that is not a landing pad block may not include a‘landingpad’ instruction.

Example:
;; A landing pad which can catch an integer.%res=landingpad{ptr,i32}catchptr@_ZTIi;; A landing pad that is a cleanup.%res=landingpad{ptr,i32}cleanup;; A landing pad which can catch an integer and can only throw a double.%res=landingpad{ptr,i32}catchptr@_ZTIifilter[1xptr][ptr@_ZTId]

catchpad’ Instruction

Syntax:
<resultval>=catchpadwithin<catchswitch>[<args>*]
Overview:

The ‘catchpad’ instruction is used byLLVM’s exception handlingsystem to specify that a basic blockbegins a catch handler — one where a personality routine attempts to transfercontrol to catch an exception.

Arguments:

Thecatchswitch operand must always be a token produced by acatchswitch instruction in a predecessor block. Thisensures that eachcatchpad has exactly one predecessor block, and it alwaysterminates in acatchswitch.

Theargs correspond to whatever information the personality routinerequires to know if this is an appropriate handler for the exception. Controlwill transfer to thecatchpad if this is the first appropriate handler forthe exception.

Theresultval has the typetoken and is used to match thecatchpad to correspondingcatchrets and other nested EHpads.

Semantics:

When the call stack is being unwound due to an exception being thrown, theexception is compared against theargs. If it doesn’t match, control willnot reach thecatchpad instruction. The representation ofargs isentirely target and personality function-specific.

Like thelandingpad instruction, thecatchpadinstruction must be the first non-phi of its parent basic block.

The meaning of the tokens produced and consumed bycatchpad and other “pad”instructions is described in theWindows exception handling documentation.

When acatchpad has been “entered” but not yet “exited” (asdescribed in theEH documentation),it is undefined behavior to execute acall orinvokethat does not carry an appropriate“funclet” bundle.

Example:
dispatch:  %cs = catchswitch within none [label %handler0] unwind to caller  ;; A catch block which can catch an integer.handler0:  %tok = catchpad within %cs [ptr @_ZTIi]

cleanuppad’ Instruction

Syntax:
<resultval>=cleanuppadwithin<parent>[<args>*]
Overview:

The ‘cleanuppad’ instruction is used byLLVM’s exception handlingsystem to specify that a basic blockis a cleanup block — one where a personality routine attempts totransfer control to run cleanup actions.Theargs correspond to whatever additionalinformation thepersonality function requires toexecute the cleanup.Theresultval has the typetoken and is used tomatch thecleanuppad to correspondingcleanuprets.Theparent argument is the token of the funclet that contains thecleanuppad instruction. If thecleanuppad is not inside a funclet,this operand may be the tokennone.

Arguments:

The instruction takes a list of arbitrary values which are interpretedby thepersonality function.

Semantics:

When the call stack is being unwound due to an exception being thrown,thepersonality function transfers control to thecleanuppad with the aid of the personality-specific arguments.As with calling conventions, how the personality function results arerepresented in LLVM IR is target specific.

Thecleanuppad instruction has several restrictions:

  • A cleanup block is a basic block which is the unwind destination ofan exceptional instruction.

  • A cleanup block must have a ‘cleanuppad’ instruction as itsfirst non-PHI instruction.

  • There can be only one ‘cleanuppad’ instruction within thecleanup block.

  • A basic block that is not a cleanup block may not include a‘cleanuppad’ instruction.

When acleanuppad has been “entered” but not yet “exited” (asdescribed in theEH documentation),it is undefined behavior to execute acall orinvokethat does not carry an appropriate“funclet” bundle.

Example:
%tok = cleanuppad within %cs []

Debug Records

Debug records appear interleaved with instructions, but are not instructions;they are used only to define debug information, and have no effect on generatedcode. They are distinguished from instructions by the use of a leading# andan extra level of indentation. As an example:

%inst1=op1%a,%b#dbg_value(%inst1,!10,!DIExpression(),!11)%inst2=op2%inst1,%c

These debug records replace the priordebug intrinsics.Debug records will be disabled if--experimental-debuginfo-iterators=false ispassed to LLVM; it is an error for both records and intrinsics to appear in thesame module. More information about debug records can be found in theLLVMSource Level Debuggingdocument.

Intrinsic Functions

LLVM supports the notion of an “intrinsic function”. These functionshave well known names and semantics and are required to follow certainrestrictions. Overall, these intrinsics represent an extension mechanismfor the LLVM language that does not require changing all of thetransformations in LLVM when adding to the language (or the bitcodereader/writer, the parser, etc…).

Intrinsic function names must all start with an “llvm.” prefix. Thisprefix is reserved in LLVM for intrinsic names; thus, function names maynot begin with this prefix. Intrinsic functions must always be externalfunctions: you cannot define the body of intrinsic functions. Intrinsicfunctions may only be used in call or invoke instructions: it is illegalto take the address of an intrinsic function. Additionally, becauseintrinsic functions are part of the LLVM language, it is required if anyare added that they be documented here.

Some intrinsic functions can be overloaded, i.e., the intrinsicrepresents a family of functions that perform the same operation but ondifferent data types. Because LLVM can represent over 8 milliondifferent integer types, overloading is used commonly to allow anintrinsic function to operate on any integer type. One or more of theargument types or the result type can be overloaded to accept anyinteger type. Argument types may also be defined as exactly matching aprevious argument’s type or the result type. This allows an intrinsicfunction which accepts multiple arguments, but needs all of them to beof the same type, to only be overloaded with respect to a singleargument or the result.

Overloaded intrinsics will have the names of its overloaded argumenttypes encoded into its function name, each preceded by a period. Onlythose types which are overloaded result in a name suffix. Argumentswhose type is matched against another type do not. For example, thellvm.ctpop function can take an integer of any width and returns aninteger of exactly the same integer width. This leads to a family offunctions such asi8@llvm.ctpop.i8(i8%val) andi29@llvm.ctpop.i29(i29%val). Only one type, the return type, isoverloaded, and only one type suffix is required. Because the argument’stype is matched against the return type, it does not require its ownname suffix.

Unnamed types are encoded ass_s. Overloaded intrinsicsthat depend on an unnamed type in one of its overloaded argument types get anadditional.<number> suffix. This allows differentiating intrinsics withdifferent unnamed types as arguments. (For example:llvm.ssa.copy.p0s_s.2(%42*)) The number is tracked in the LLVM module andit ensures unique names in the module. While linking together two modules, it isstill possible to get a name clash. In that case one of the names will bechanged by getting a new number.

For target developers who are defining intrinsics for back-end codegeneration, any intrinsic overloads based solely the distinction betweeninteger or floating point types should not be relied upon for correctcode generation. In such cases, the recommended approach for targetmaintainers when defining intrinsics is to create separate integer andFP intrinsics rather than rely on overloading. For example, if differentcodegen is required forllvm.target.foo(<4xi32>) andllvm.target.foo(<4xfloat>) then these should be split intodifferent intrinsics.

To learn how to add an intrinsic function, please see theExtendingLLVM Guide.

Variable Argument Handling Intrinsics

Variable argument support is defined in LLVM with theva_arg instruction and these three intrinsicfunctions. These functions are related to the similarly named macrosdefined in the<stdarg.h> header file.

All of these functions take as arguments pointers to a target-specificvalue type “va_list”. The LLVM assembly language reference manualdoes not define what this type is, so all transformations should beprepared to handle these functions regardless of the type used. The intrinsicsare overloaded, and can be used for pointers to different address spaces.

This example shows how theva_arg instruction and thevariable argument handling intrinsic functions are used.

; This struct is different for every platform. For most platforms,; it is merely a ptr.%struct.va_list=type{ptr}; For Unix x86_64 platforms, va_list is the following struct:; %struct.va_list = type { i32, i32, ptr, ptr }definei32@test(i32%X,...){; Initialize variable argument processing%ap=alloca%struct.va_listcallvoid@llvm.va_start.p0(ptr%ap); Read a single integer argument%tmp=va_argptr%ap,i32; Demonstrate usage of llvm.va_copy and llvm.va_end%aq=allocaptrcallvoid@llvm.va_copy.p0(ptr%aq,ptr%ap)callvoid@llvm.va_end.p0(ptr%aq); Stop processing of arguments.callvoid@llvm.va_end.p0(ptr%ap)reti32%tmp}declarevoid@llvm.va_start.p0(ptr)declarevoid@llvm.va_copy.p0(ptr,ptr)declarevoid@llvm.va_end.p0(ptr)

llvm.va_start’ Intrinsic

Syntax:
declarevoid@llvm.va_start.p0(ptr<arglist>)declarevoid@llvm.va_start.p5(ptraddrspace(5)<arglist>)
Overview:

The ‘llvm.va_start’ intrinsic initializes<arglist> forsubsequent use byva_arg.

Arguments:

The argument is a pointer to ava_list element to initialize.

Semantics:

The ‘llvm.va_start’ intrinsic works just like theva_start macroavailable in C. In a target-dependent way, it initializes theva_list element to which the argument points, so that the next calltova_arg will produce the first variable argument passed to thefunction. Unlike the Cva_start macro, this intrinsic does not needto know the last argument of the function as the compiler can figurethat out.

llvm.va_end’ Intrinsic

Syntax:
declarevoid@llvm.va_end.p0(ptr<arglist>)declarevoid@llvm.va_end.p5(ptraddrspace(5)<arglist>)
Overview:

The ‘llvm.va_end’ intrinsic destroys<arglist>, which has beeninitialized previously withllvm.va_start orllvm.va_copy.

Arguments:

The argument is a pointer to ava_list to destroy.

Semantics:

The ‘llvm.va_end’ intrinsic works just like theva_end macroavailable in C. In a target-dependent way, it destroys theva_listelement to which the argument points. Calls tollvm.va_start andllvm.va_copy must be matched exactly with calls tollvm.va_end.

llvm.va_copy’ Intrinsic

Syntax:
declarevoid@llvm.va_copy.p0(ptr<destarglist>,ptr<srcarglist>)declarevoid@llvm.va_copy.p5(ptraddrspace(5)<destarglist>,ptraddrspace(5)<srcarglist>)
Overview:

The ‘llvm.va_copy’ intrinsic copies the current argument positionfrom the source argument list to the destination argument list.

Arguments:

The first argument is a pointer to ava_list element to initialize.The second argument is a pointer to ava_list element to copy from.The address spaces of the two arguments must match.

Semantics:

The ‘llvm.va_copy’ intrinsic works just like theva_copy macroavailable in C. In a target-dependent way, it copies the sourceva_list element into the destinationva_list element. Thisintrinsic is necessary because the `` llvm.va_start`` intrinsic may bearbitrarily complex and require, for example, memory allocation.

Accurate Garbage Collection Intrinsics

LLVM’s support forAccurate Garbage Collection(GC) requires the frontend to generate code containing appropriate intrinsiccalls and select an appropriate GC strategy which knows how to lower theseintrinsics in a manner which is appropriate for the target collector.

These intrinsics allow identification ofGC roots on thestack, as well as garbage collector implementations thatrequireread andwrite barriers.Frontends for type-safe garbage collected languages should generatethese intrinsics to make use of the LLVM garbage collectors. For moredetails, seeGarbage Collection with LLVM.

LLVM provides an second experimental set of intrinsics for describing garbagecollection safepoints in compiled code. These intrinsics are an alternativeto thellvm.gcroot intrinsics, but are compatible with the ones forread andwrite barriers. Thedifferences in approach are covered in theGarbage Collection with LLVM documentation. The intrinsics themselves aredescribed inGarbage Collection Safepoints in LLVM.

llvm.gcroot’ Intrinsic

Syntax:
declarevoid@llvm.gcroot(ptr%ptrloc,ptr%metadata)
Overview:

The ‘llvm.gcroot’ intrinsic declares the existence of a GC root tothe code generator, and allows some metadata to be associated with it.

Arguments:

The first argument specifies the address of a stack object that containsthe root pointer. The second pointer (which must be either a constant ora global value address) contains the meta-data to be associated with theroot.

Semantics:

At runtime, a call to this intrinsic stores a null pointer into the“ptrloc” location. At compile-time, the code generator generatesinformation to allow the runtime to find the pointer at GC safe points.The ‘llvm.gcroot’ intrinsic may only be used in a function whichspecifies a GC algorithm.

llvm.gcread’ Intrinsic

Syntax:
declareptr@llvm.gcread(ptr%ObjPtr,ptr%Ptr)
Overview:

The ‘llvm.gcread’ intrinsic identifies reads of references from heaplocations, allowing garbage collector implementations that require readbarriers.

Arguments:

The second argument is the address to read from, which should be anaddress allocated from the garbage collector. The first object is apointer to the start of the referenced object, if needed by the languageruntime (otherwise null).

Semantics:

The ‘llvm.gcread’ intrinsic has the same semantics as a loadinstruction, but may be replaced with substantially more complex code bythe garbage collector runtime, as needed. The ‘llvm.gcread’intrinsic may only be used in a function whichspecifies a GCalgorithm.

llvm.gcwrite’ Intrinsic

Syntax:
declarevoid@llvm.gcwrite(ptr%P1,ptr%Obj,ptr%P2)
Overview:

The ‘llvm.gcwrite’ intrinsic identifies writes of references to heaplocations, allowing garbage collector implementations that require writebarriers (such as generational or reference counting collectors).

Arguments:

The first argument is the reference to store, the second is the start ofthe object to store it to, and the third is the address of the field ofObj to store to. If the runtime does not require a pointer to theobject, Obj may be null.

Semantics:

The ‘llvm.gcwrite’ intrinsic has the same semantics as a storeinstruction, but may be replaced with substantially more complex code bythe garbage collector runtime, as needed. The ‘llvm.gcwrite’intrinsic may only be used in a function whichspecifies a GCalgorithm.

llvm.experimental.gc.statepoint’ Intrinsic

Syntax:
declaretoken@llvm.experimental.gc.statepoint(i64<id>,i32<numpatchbytes>,ptrelementtype(func_type)<target>,i64<#call args>, i64 <flags>,...(callparameters),i640,i640)
Overview:

The statepoint intrinsic represents a call which is parse-able by theruntime.

Operands:

The ‘id’ operand is a constant integer that is reported as the IDfield in the generated stackmap. LLVM does not interpret thisparameter in any way and its meaning is up to the statepoint user todecide. Note that LLVM is free to duplicate code containingstatepoint calls, and this may transform IR that had a unique ‘id’ perlexical call to statepoint to IR that does not.

If ‘num patch bytes’ is non-zero then the call instructioncorresponding to the statepoint is not emitted and LLVM emits ‘numpatch bytes’ bytes of nops in its place. LLVM will emit code toprepare the function arguments and retrieve the function return valuein accordance to the calling convention; the former before the nopsequence and the latter after the nop sequence. It is expected thatthe user will patch over the ‘num patch bytes’ bytes of nops with acalling sequence specific to their runtime before executing thegenerated machine code. There are no guarantees with respect to thealignment of the nop sequence. UnlikeStack maps and patch points in LLVM statepoints donot have a concept of shadow bytes. Note that semantically thestatepoint still represents a call or invoke to ‘target’, and the nopsequence after patching is expected to represent an operationequivalent to a call or invoke to ‘target’.

The ‘target’ operand is the function actually being called. The operandmust have anelementtype attribute specifyingthe function type of the target. The target can be specified as eithera symbolic LLVM function, or as an arbitrary Value of pointer type. Notethat the function type must match the signature of the callee and thetypes of the ‘call parameters’ arguments.

The ‘#call args’ operand is the number of arguments to the actualcall. It must exactly match the number of arguments passed in the‘call parameters’ variable length section.

The ‘flags’ operand is used to specify extra information about thestatepoint. This is currently only used to mark certain statepointsas GC transitions. This operand is a 64-bit integer with the followinglayout, where bit 0 is the least significant bit:

Bit #

Usage

0

Set if the statepoint is a GC transition, clearedotherwise.

1-63

Reserved for future use; must be cleared.

The ‘call parameters’ arguments are simply the arguments which need tobe passed to the call target. They will be lowered according to thespecified calling convention and otherwise handled like a normal callinstruction. The number of arguments must exactly match what isspecified in ‘# call args’. The types must match the signature of‘target’.

The ‘call parameter’ attributes must be followed by two ‘i64 0’ constants.These were originally the length prefixes for ‘gc transition parameter’ and‘deopt parameter’ arguments, but the role of these parameter sets have beenentirely replaced with the corresponding operand bundles. In a futurerevision, these now redundant arguments will be removed.

Semantics:

A statepoint is assumed to read and write all memory. As a result,memory operations can not be reordered past a statepoint. It isillegal to mark a statepoint as being either ‘readonly’ or ‘readnone’.

Note that legal IR can not perform any memory operation on a ‘gcpointer’ argument of the statepoint in a location statically reachablefrom the statepoint. Instead, the explicitly relocated value (from agc.relocate) must be used.

llvm.experimental.gc.result’ Intrinsic

Syntax:
declaretype@llvm.experimental.gc.result(token%statepoint_token)
Overview:

gc.result extracts the result of the original call instructionwhich was replaced by thegc.statepoint. Thegc.resultintrinsic is actually a family of three intrinsics due to animplementation limitation. Other than the type of the return value,the semantics are the same.

Operands:

The first and only argument is thegc.statepoint which startsthe safepoint sequence of which thisgc.result is a part.Despite the typing of this as a generic token,only the value definedby agc.statepoint is legal here.

Semantics:

Thegc.result represents the return value of the call target ofthestatepoint. The type of thegc.result must exactly matchthe type of the target. If the call target returns void, there willbe nogc.result.

Agc.result is modeled as a ‘readnone’ pure function. It has noside effects since it is just a projection of the return value of theprevious call represented by thegc.statepoint.

llvm.experimental.gc.relocate’ Intrinsic

Syntax:
declare<pointertype>@llvm.experimental.gc.relocate(token%statepoint_token,i32%base_offset,i32%pointer_offset)
Overview:

Agc.relocate returns the potentially relocated value of a pointerat the safepoint.

Operands:

The first argument is thegc.statepoint which starts thesafepoint sequence of which thisgc.relocation is a part.Despite the typing of this as a generic token,only the value definedby agc.statepoint is legal here.

The second and third arguments are both indices into operands of thecorresponding statepoint’sgc-live operand bundle.

The second argument is an index which specifies the allocation for the pointerbeing relocated. The associated value must be within the object with which thepointer being relocated is associated. The optimizer is free to changewhichinterior derived pointer is reported, provided that it does not replace anactual base pointer with another interior derived pointer. Collectors areallowed to rely on the base pointer operand remaining an actual base pointer ifso constructed.

The third argument is an index which specify the (potentially) derived pointerbeing relocated. It is legal for this index to be the same as the secondargument if-and-only-if a base pointer is being relocated.

Semantics:

The return value ofgc.relocate is the potentially relocated valueof the pointer specified by its arguments. It is unspecified how thevalue of the returned pointer relates to the argument to thegc.statepoint other than that a) it points to the same sourcelanguage object with the same offset, and b) the ‘based-on’relationship of the newly relocated pointers is a projection of theunrelocated pointers. In particular, the integer value of the pointerreturned is unspecified.

Agc.relocate is modeled as areadnone pure function. It has noside effects since it is just a way to extract information about workdone during the actual call modeled by thegc.statepoint.

llvm.experimental.gc.get.pointer.base’ Intrinsic

Syntax:
declare<pointertype>@llvm.experimental.gc.get.pointer.base(<pointertype>readnonecaptures(none)%derived_ptr)nounwindwillreturnmemory(none)
Overview:

gc.get.pointer.base for a derived pointer returns its base pointer.

Operands:

The only argument is a pointer which is based on some object withan unknown offset from the base of said object.

Semantics:

This intrinsic is used in the abstract machine model for GC to representthe base pointer for an arbitrary derived pointer.

This intrinsic is inlined by theRewriteStatepointsForGC pass byreplacing all uses of this callsite with the offset of a derived pointer fromits base pointer value. The replacement is done as part of the lowering to theexplicit statepoint model.

The return pointer type must be the same as the type of the parameter.

llvm.experimental.gc.get.pointer.offset’ Intrinsic

Syntax:
declarei64@llvm.experimental.gc.get.pointer.offset(<pointertype>readnonecaptures(none)%derived_ptr)nounwindwillreturnmemory(none)
Overview:

gc.get.pointer.offset for a derived pointer returns the offset from itsbase pointer.

Operands:

The only argument is a pointer which is based on some object withan unknown offset from the base of said object.

Semantics:

This intrinsic is used in the abstract machine model for GC to representthe offset of an arbitrary derived pointer from its base pointer.

This intrinsic is inlined by theRewriteStatepointsForGC pass byreplacing all uses of this callsite with the offset of a derived pointer fromits base pointer value. The replacement is done as part of the lowering to theexplicit statepoint model.

Basically this call calculates difference between the derived pointer and itsbase pointer (see‘llvm.experimental.gc.get.pointer.base’ Intrinsic) both ptrtoint casted. Butthis cast done outside theRewriteStatepointsForGC pass could resultin the pointers lost for further lowering from the abstract model to theexplicit physical one.

Code Generator Intrinsics

These intrinsics are provided by LLVM to expose special features thatmay only be implemented with code generator support.

llvm.returnaddress’ Intrinsic

Syntax:
declareptr@llvm.returnaddress(i32<level>)
Overview:

The ‘llvm.returnaddress’ intrinsic attempts to compute atarget-specific value indicating the return address of the currentfunction or one of its callers.

Arguments:

The argument to this intrinsic indicates which function to return theaddress for. Zero indicates the calling function, one indicates itscaller, etc. The argument isrequired to be a constant integervalue.

Semantics:

The ‘llvm.returnaddress’ intrinsic either returns a pointerindicating the return address of the specified call frame, or zero if itcannot be identified. The value returned by this intrinsic is likely tobe incorrect or 0 for arguments other than zero, so it should only beused for debugging purposes.

Note that calling this intrinsic does not prevent function inlining orother aggressive transformations, so the value returned may not be thatof the obvious source-language caller.

llvm.addressofreturnaddress’ Intrinsic

Syntax:
declareptr@llvm.addressofreturnaddress()
Overview:

The ‘llvm.addressofreturnaddress’ intrinsic returns a target-specificpointer to the place in the stack frame where the return address of thecurrent function is stored.

Semantics:

Note that calling this intrinsic does not prevent function inlining orother aggressive transformations, so the value returned may not be thatof the obvious source-language caller.

This intrinsic is only implemented for x86 and aarch64.

llvm.sponentry’ Intrinsic

Syntax:
declareptr@llvm.sponentry()
Overview:

The ‘llvm.sponentry’ intrinsic returns the stack pointer value atthe entry of the current function calling this intrinsic.

Semantics:

Note this intrinsic is only verified on AArch64 and ARM.

llvm.frameaddress’ Intrinsic

Syntax:
declareptr@llvm.frameaddress(i32<level>)
Overview:

The ‘llvm.frameaddress’ intrinsic attempts to return thetarget-specific frame pointer value for the specified stack frame.

Arguments:

The argument to this intrinsic indicates which function to return theframe pointer for. Zero indicates the calling function, one indicatesits caller, etc. The argument isrequired to be a constant integervalue.

Semantics:

The ‘llvm.frameaddress’ intrinsic either returns a pointerindicating the frame address of the specified call frame, or zero if itcannot be identified. The value returned by this intrinsic is likely tobe incorrect or 0 for arguments other than zero, so it should only beused for debugging purposes.

Note that calling this intrinsic does not prevent function inlining orother aggressive transformations, so the value returned may not be thatof the obvious source-language caller.

llvm.swift.async.context.addr’ Intrinsic

Syntax:
declareptr@llvm.swift.async.context.addr()
Overview:

The ‘llvm.swift.async.context.addr’ intrinsic returns a pointer tothe part of the extended frame record containing the asynchronouscontext of a Swift execution.

Semantics:

If the caller has aswiftasync parameter, that argument will initiallybe stored at the returned address. If not, it will be initialized to null.

llvm.localescape’ and ‘llvm.localrecover’ Intrinsics

Syntax:
declarevoid@llvm.localescape(...)declareptr@llvm.localrecover(ptr%func,ptr%fp,i32%idx)
Overview:

The ‘llvm.localescape’ intrinsic escapes offsets of a collection of staticallocas, and the ‘llvm.localrecover’ intrinsic applies those offsets to alive frame pointer to recover the address of the allocation. The offset iscomputed during frame layout of the caller ofllvm.localescape.

Arguments:

All arguments to ‘llvm.localescape’ must be pointers to static allocas orcasts of static allocas. Each function can only call ‘llvm.localescape’once, and it can only do so from the entry block.

Thefunc argument to ‘llvm.localrecover’ must be a constantbitcasted pointer to a function defined in the current module. The codegenerator cannot determine the frame allocation offset of functions defined inother modules.

Thefp argument to ‘llvm.localrecover’ must be a frame pointer of acall frame that is currently live. The return value of ‘llvm.localaddress’is one way to produce such a value, but various runtimes also expose a suitablepointer in platform-specific ways.

Theidx argument to ‘llvm.localrecover’ indicates which alloca passed to‘llvm.localescape’ to recover. It is zero-indexed.

Semantics:

These intrinsics allow a group of functions to share access to a set of localstack allocations of a one parent function. The parent function may call the‘llvm.localescape’ intrinsic once from the function entry block, and thechild functions can use ‘llvm.localrecover’ to access the escaped allocas.The ‘llvm.localescape’ intrinsic blocks inlining, as inlining changes wherethe escaped allocas are allocated, which would break attempts to use‘llvm.localrecover’.

llvm.seh.try.begin’ and ‘llvm.seh.try.end’ Intrinsics

Syntax:
declarevoid@llvm.seh.try.begin()declarevoid@llvm.seh.try.end()
Overview:

The ‘llvm.seh.try.begin’ and ‘llvm.seh.try.end’ intrinsics markthe boundary of a _try region for Windows SEH Asynchronous Exception Handling.

Semantics:

When a C-function is compiled with Windows SEH Asynchronous Exception option,-feh_asynch (aka MSVC -EHa), these two intrinsics are injected to mark _tryboundary and to prevent potential exceptions from being moved across boundary.Any set of operations can then be confined to the region by reading their leafinputs via volatile loads and writing their root outputs via volatile stores.

llvm.seh.scope.begin’ and ‘llvm.seh.scope.end’ Intrinsics

Syntax:
declarevoid@llvm.seh.scope.begin()declarevoid@llvm.seh.scope.end()
Overview:

The ‘llvm.seh.scope.begin’ and ‘llvm.seh.scope.end’ intrinsics markthe boundary of a CPP object lifetime for Windows SEH Asynchronous ExceptionHandling (MSVC option -EHa).

Semantics:

LLVM’s ordinary exception-handling representation associates EH cleanups andhandlers only withinvoke``s,whichnormallycorrespondonlytocallsites. Tosupportarbitraryfaultinginstructions,itmustbepossibletorecoverthecurrentEHscopeforanyinstruction. TurningeveryoperationinLLVMthatcouldfaultintoan``invoke of a new, potentially-throwing intrinsic would require adding alarge number of intrinsics, impede optimization of those operations, and makecompilation slower by introducing many extra basic blocks. These intrinsics canbe used instead to mark the region protected by a cleanup, such as for a localC++ object with a non-trivial destructor.llvm.seh.scope.begin is used to markthe start of the region; it is always called withinvoke, with the unwind blockbeing the desired unwind destination for any potentially-throwing instructionswithin the region.llvm.seh.scope.end is used to mark when the scope endsand the EH cleanup is no longer required (e.g. because the destructor is beingcalled).

llvm.read_register’, ‘llvm.read_volatile_register’, and ‘llvm.write_register’ Intrinsics

Syntax:
declare i32 @llvm.read_register.i32(metadata)declare i64 @llvm.read_register.i64(metadata)declare i32 @llvm.read_volatile_register.i32(metadata)declare i64 @llvm.read_volatile_register.i64(metadata)declare void @llvm.write_register.i32(metadata, i32 @value)declare void @llvm.write_register.i64(metadata, i64 @value)!0 = !{!"sp\00"}
Overview:

The ‘llvm.read_register’, ‘llvm.read_volatile_register’, and‘llvm.write_register’ intrinsics provide access to the named register.The register must be valid on the architecture being compiled to. The typeneeds to be compatible with the register being read.

Semantics:

The ‘llvm.read_register’ and ‘llvm.read_volatile_register’ intrinsicsreturn the current value of the register, where possible. The‘llvm.write_register’ intrinsic sets the current value of the register,where possible.

A call to ‘llvm.read_volatile_register’ is assumed to have side-effectsand possibly return a different value each time (e.g. for a timer register).

This is useful to implement named register global variables that needto always be mapped to a specific register, as is common practice onbare-metal programs including OS kernels.

The compiler doesn’t check for register availability or use of the usedregister in surrounding code, including inline assembly. Because of that,allocatable registers are not supported.

Warning: So far it only works with the stack pointer on selectedarchitectures (ARM, AArch64, PowerPC and x86_64). Significant amount ofwork is needed to support other registers and even more so, allocatableregisters.

llvm.stacksave’ Intrinsic

Syntax:
declareptr@llvm.stacksave.p0()declareptraddrspace(5)@llvm.stacksave.p5()
Overview:

The ‘llvm.stacksave’ intrinsic is used to remember the current stateof the function stack, for use withllvm.stackrestore. This is useful forimplementing language features like scoped automatic variable sizedarrays in C99.

Semantics:

This intrinsic returns an opaque pointer value that can be passed tollvm.stackrestore. When anllvm.stackrestore intrinsic is executed with a value saved fromllvm.stacksave, it effectively restores the state of the stack tothe state it was in when thellvm.stacksave intrinsic executed. Inpractice, this pops anyalloca blocks from the stackthat were allocated after thellvm.stacksave was executed. Theaddress space should typically be thealloca address space.

llvm.stackrestore’ Intrinsic

Syntax:
declarevoid@llvm.stackrestore.p0(ptr%ptr)declarevoid@llvm.stackrestore.p5(ptraddrspace(5)%ptr)
Overview:

The ‘llvm.stackrestore’ intrinsic is used to restore the state ofthe function stack to the state it was in when the correspondingllvm.stacksave intrinsic executed. This isuseful for implementing language features like scoped automaticvariable sized arrays in C99. The address space should typically bethealloca address space.

Semantics:

See the description forllvm.stacksave.

llvm.get.dynamic.area.offset’ Intrinsic

Syntax:
declarei32@llvm.get.dynamic.area.offset.i32()declarei64@llvm.get.dynamic.area.offset.i64()
Overview:

The ‘llvm.get.dynamic.area.offset.*’ intrinsic family is used toget the offset from native stack pointer to the address of the mostrecent dynamic alloca on the caller’s stack. These intrinsics areintended for use in combination withllvm.stacksave to get apointer to the most recent dynamic alloca. This is useful, for example,for AddressSanitizer’s stack unpoisoning routines.

Semantics:

These intrinsics return a non-negative integer value that can be used toget the address of the most recent dynamic alloca, allocated byallocaon the caller’s stack. In particular, for targets where stack grows downwards,adding this offset to the native stack pointer would get the address of the mostrecent dynamic alloca. For targets where stack grows upwards, the situation is a bit morecomplicated, because subtracting this value from stack pointer would get the addressone past the end of the most recent dynamic alloca.

Although for most targetsllvm.get.dynamic.area.offset <int_get_dynamic_area_offset>returns just a zero, for others, such as PowerPC and PowerPC64, it returns acompile-time-known constant value.

The return value type ofllvm.get.dynamic.area.offsetmust match the target’salloca address space type.

llvm.prefetch’ Intrinsic

Syntax:
declarevoid@llvm.prefetch(ptr<address>,i32<rw>,i32<locality>,i32<cachetype>)
Overview:

The ‘llvm.prefetch’ intrinsic is a hint to the code generator toinsert a prefetch instruction if supported; otherwise, it is a noop.Prefetches have no effect on the behavior of the program but can changeits performance characteristics.

Arguments:

address is the address to be prefetched,rw is the specifierdetermining if the fetch should be for a read (0) or write (1), andlocality is a temporal locality specifier ranging from (0) - nolocality, to (3) - extremely local keep in cache. Thecachetypespecifies whether the prefetch is performed on the data (1) orinstruction (0) cache. Therw,locality andcachetypearguments must be constant integers.

Semantics:

This intrinsic does not modify the behavior of the program. Inparticular, prefetches cannot trap and do not produce a value. Ontargets that support this intrinsic, the prefetch can provide hints tothe processor cache for better performance.

llvm.pcmarker’ Intrinsic

Syntax:
declarevoid@llvm.pcmarker(i32<id>)
Overview:

The ‘llvm.pcmarker’ intrinsic is a method to export a ProgramCounter (PC) in a region of code to simulators and other tools. Themethod is target specific, but it is expected that the marker will useexported symbols to transmit the PC of the marker. The marker makes noguarantees that it will remain with any specific instruction afteroptimizations. It is possible that the presence of a marker will inhibitoptimizations. The intended use is to be inserted after optimizations toallow correlations of simulation runs.

Arguments:

id is a numerical id identifying the marker.

Semantics:

This intrinsic does not modify the behavior of the program. Backendsthat do not support this intrinsic may ignore it.

llvm.readcyclecounter’ Intrinsic

Syntax:
declarei64@llvm.readcyclecounter()
Overview:

The ‘llvm.readcyclecounter’ intrinsic provides access to the cyclecounter register (or similar low latency, high accuracy clocks) on thosetargets that support it. On X86, it should map to RDTSC. On Alpha, itshould map to RPCC. As the backing counters overflow quickly (on theorder of 9 seconds on alpha), this should only be used for smalltimings.

Semantics:

When directly supported, reading the cycle counter should not modify anymemory. Implementations are allowed to either return an applicationspecific value or a system wide value. On backends without support, thisis lowered to a constant 0.

Note that runtime support may be conditional on the privilege-level code isrunning at and the host platform.

llvm.readsteadycounter’ Intrinsic

Syntax:
declarei64@llvm.readsteadycounter()
Overview:

The ‘llvm.readsteadycounter’ intrinsic provides access to the fixedfrequency clock on targets that support it. Unlike ‘llvm.readcyclecounter’,this clock is expected to tick at a constant rate, making it suitable formeasuring elapsed time. The actual frequency of the clock is implementationdefined.

Semantics:

When directly supported, reading the steady counter should not modify anymemory. Implementations are allowed to either return an applicationspecific value or a system wide value. On backends without support, thisis lowered to a constant 0.

llvm.clear_cache’ Intrinsic

Syntax:
declarevoid@llvm.clear_cache(ptr,ptr)
Overview:

The ‘llvm.clear_cache’ intrinsic ensures visibility of modificationsin the specified range to the execution unit of the processor. Ontargets with non-unified instruction and data cache, the implementationflushes the instruction cache.

Semantics:

On platforms with coherent instruction and data caches (e.g. x86), thisintrinsic is a nop. On platforms with non-coherent instruction and datacache (e.g. ARM, MIPS), the intrinsic is lowered either to appropriateinstructions or a system call, if cache flushing requires specialprivileges.

The default behavior is to emit a call to__clear_cache from the runtime library.

This intrinsic doesnot empty the instruction pipeline. Modificationsof the current function are outside the scope of the intrinsic.

llvm.instrprof.increment’ Intrinsic

Syntax:
declarevoid@llvm.instrprof.increment(ptr<name>,i64<hash>,i32<num-counters>,i32<index>)
Overview:

The ‘llvm.instrprof.increment’ intrinsic can be emitted by afrontend for use with instrumentation based profiling. These will belowered by the-instrprof pass to generate execution counts of aprogram at runtime.

Arguments:

The first argument is a pointer to a global variable containing thename of the entity being instrumented. This should generally be the(mangled) function name for a set of counters.

The second argument is a hash value that can be used by the consumerof the profile data to detect changes to the instrumented source, andthe third is the number of counters associated withname. It is anerror ifhash ornum-counters differ between two instances ofinstrprof.increment that refer to the same name.

The last argument refers to which of the counters forname shouldbe incremented. It should be a value between 0 andnum-counters.

Semantics:

This intrinsic represents an increment of a profiling counter. It willcause the-instrprof pass to generate the appropriate datastructures and the code to increment the appropriate value, in aformat that can be written out by a compiler runtime and consumed viathellvm-profdata tool.

The intrinsic is lowered differently for contextual profiling by the-ctx-instr-lower pass. Here:

  • the entry basic block increment counter is lowered as a call to compiler-rt,to either__llvm_ctx_profile_start_context or__llvm_ctx_profile_get_context. Either returns a pointer to a context objectwhich contains a buffer into which counter increments can happen. Note that thepointer value returned by compiler-rt may have its LSB set - counter incrementshappen offset from the address with the LSB cleared.

  • all the other lowerings ofllvm.instrprof.increment[.step] happen withinthat context.

  • the context is assumed to be a local value to the function, and no concurrencyconcerns need to be handled by LLVM.

llvm.instrprof.increment.step’ Intrinsic

Syntax:
declarevoid@llvm.instrprof.increment.step(ptr<name>,i64<hash>,i32<num-counters>,i32<index>,i64<step>)
Overview:

The ‘llvm.instrprof.increment.step’ intrinsic is an extension tothe ‘llvm.instrprof.increment’ intrinsic with an additional fifthargument to specify the step of the increment.

Arguments:

The first four arguments are the same as ‘llvm.instrprof.increment’intrinsic.

The last argument specifies the value of the increment of the counter variable.

Semantics:

See description of ‘llvm.instrprof.increment’ intrinsic.

llvm.instrprof.callsite’ Intrinsic

Syntax:
declarevoid@llvm.instrprof.callsite(ptr<name>,i64<hash>,i32<num-counters>,i32<index>,ptr<callsite>)
Overview:

The ‘llvm.instrprof.callsite’ intrinsic should be emitted before a callsitethat’s not to a “fake” callee (like another intrinsic or asm). It is used bycontextual profiling and has side-effects. Its lowering happens in IR, andtarget-specific backends should never encounter it.

Arguments:

The first 4 arguments are similar tollvm.instrprof.increment. The indexingis specific to callsites, meaning callsites are indexed from 0, independent fromthe indexes used by the other intrinsics (such asllvm.instrprof.increment[.step]).

The last argument is the called value of the callsite this intrinsic precedes.

Semantics:

This is lowered by contextual profiling. In contextual profiling, functions get,from compiler-rt, a pointer to a context object. The context object consists ofa buffer LLVM can use to perform counter increments (i.e. the lowering ofllvm.instrprof.increment[.step]. The address range following the counterbuffer,<num-counters> xsizeof(ptr) - sized, is expected to containpointers to contexts of functions called from this function (“subcontexts”).LLVM does not dereference into that memory region, just calculates GEPs.

The lowering ofllvm.instrprof.callsite consists of:

  • write to__llvm_ctx_profile_expected_callee the<callsite> value;

  • write to__llvm_ctx_profile_callsite the address into this function’scontext of the<index> position into the subcontexts region.

__llvm_ctx_profile_{expected_callee|callsite} are initialized by compiler-rtand are TLS. They are both vectors of pointers of size 2. The index into each isdetermined when the current function obtains the pointer to its context fromcompiler-rt. The pointer’s LSB gives the index.

llvm.instrprof.timestamp’ Intrinsic

Syntax:
declarevoid@llvm.instrprof.timestamp(i8*<name>,i64<hash>,i32<num-counters>,i32<index>)
Overview:

The ‘llvm.instrprof.timestamp’ intrinsic is used to implement temporalprofiling.

Arguments:

The arguments are the same as ‘llvm.instrprof.increment’. Theindex isexpected to always be zero.

Semantics:

Similar to the ‘llvm.instrprof.increment’ intrinsic, but it stores atimestamp representing when this function was executed for the first time.

llvm.instrprof.cover’ Intrinsic

Syntax:
declarevoid@llvm.instrprof.cover(ptr<name>,i64<hash>,i32<num-counters>,i32<index>)
Overview:

The ‘llvm.instrprof.cover’ intrinsic is used to implement coverageinstrumentation.

Arguments:

The arguments are the same as the first four arguments of‘llvm.instrprof.increment’.

Semantics:

Similar to the ‘llvm.instrprof.increment’ intrinsic, but it stores zero tothe profiling variable to signify that the function has been covered. We storezero because this is more efficient on some targets.

llvm.instrprof.value.profile’ Intrinsic

Syntax:
declarevoid@llvm.instrprof.value.profile(ptr<name>,i64<hash>,i64<value>,i32<value_kind>,i32<index>)
Overview:

The ‘llvm.instrprof.value.profile’ intrinsic can be emitted by afrontend for use with instrumentation based profiling. This will belowered by the-instrprof pass to find out the target values,instrumented expressions take in a program at runtime.

Arguments:

The first argument is a pointer to a global variable containing thename of the entity being instrumented.name should generally be the(mangled) function name for a set of counters.

The second argument is a hash value that can be used by the consumerof the profile data to detect changes to the instrumented source. Itis an error ifhash differs between two instances ofllvm.instrprof.* that refer to the same name.

The third argument is the value of the expression being profiled. The profiledexpression’s value should be representable as an unsigned 64-bit value. Thefourth argument represents the kind of value profiling that is being done. Thesupported value profiling kinds are enumerated through theInstrProfValueKind type declared in the<include/llvm/ProfileData/InstrProf.h> header file. The last argument is theindex of the instrumented expression withinname. It should be >= 0.

Semantics:

This intrinsic represents the point where a call to a runtime routineshould be inserted for value profiling of target expressions.-instrprofpass will generate the appropriate data structures and replace thellvm.instrprof.value.profile intrinsic with the call to the profileruntime library with proper arguments.

llvm.instrprof.mcdc.parameters’ Intrinsic

Syntax:
declarevoid@llvm.instrprof.mcdc.parameters(ptr<name>,i64<hash>,i32<bitmap-bits>)
Overview:

The ‘llvm.instrprof.mcdc.parameters’ intrinsic is used to initiate MC/DCcode coverage instrumentation for a function.

Arguments:

The first argument is a pointer to a global variable containing thename of the entity being instrumented. This should generally be the(mangled) function name for a set of counters.

The second argument is a hash value that can be used by the consumerof the profile data to detect changes to the instrumented source.

The third argument is the number of bitmap bits required by the function torecord the number of test vectors executed for each boolean expression.

Semantics:

This intrinsic represents basic MC/DC parameters initiating one or more MC/DCinstrumentation sequences in a function. It will cause the-instrprof passto generate the appropriate data structures and the code to instrument MC/DCtest vectors in a format that can be written out by a compiler runtime andconsumed via thellvm-profdata tool.

llvm.instrprof.mcdc.tvbitmap.update’ Intrinsic

Syntax:
declarevoid@llvm.instrprof.mcdc.tvbitmap.update(ptr<name>,i64<hash>,i32<bitmap-index>,ptr<mcdc-temp-addr>)
Overview:

The ‘llvm.instrprof.mcdc.tvbitmap.update’ intrinsic is used to track MC/DCtest vector execution after each boolean expression has been fully executed.The overall value of the condition bitmap, after it has been successivelyupdated with the true or false evaluation of each condition, uniquely identifiesan executed MC/DC test vector and is used as a bit index into the global testvector bitmap.

Arguments:

The first argument is a pointer to a global variable containing thename of the entity being instrumented. This should generally be the(mangled) function name for a set of counters.

The second argument is a hash value that can be used by the consumerof the profile data to detect changes to the instrumented source.

The third argument is the bit index into the global test vector bitmapcorresponding to the function.

The fourth argument is the address of the condition bitmap, which contains avalue representing an executed MC/DC test vector. It is loaded and used as thebit index of the test vector bitmap.

Semantics:

This intrinsic represents the final operation of an MC/DC instrumentationsequence and will cause the-instrprof pass to generate the code toinstrument an update of a function’s global test vector bitmap to indicate thata test vector has been executed. The global test vector bitmap can be consumedby thellvm-profdata andllvm-cov tools.

llvm.thread.pointer’ Intrinsic

Syntax:
declareptr@llvm.thread.pointer.p0()declareptraddrspace(5)@llvm.thread.pointer.p5()
Overview:

The ‘llvm.thread.pointer’ intrinsic returns the value of the threadpointer.

Semantics:

The ‘llvm.thread.pointer’ intrinsic returns a pointer to the TLS areafor the current thread. The exact semantics of this value are targetspecific: it may point to the start of TLS area, to the end, or somewherein the middle. Depending on the target, this intrinsic may read a register,call a helper function, read from an alternate memory space, or performother operations necessary to locate the TLS area. Not all targets supportthis intrinsic. The address space must be theglobals address space.

llvm.call.preallocated.setup’ Intrinsic

Syntax:
declaretoken@llvm.call.preallocated.setup(i32%num_args)
Overview:

The ‘llvm.call.preallocated.setup’ intrinsic returns a token which canbe used with a call’s"preallocated" operand bundle to indicate thatcertain arguments are allocated and initialized before the call.

Semantics:

The ‘llvm.call.preallocated.setup’ intrinsic returns a token which isassociated with at most one call. The token can be passed to‘@llvm.call.preallocated.arg’ to get a pointer to get thatcorresponding argument. The token must be the parameter to a"preallocated" operand bundle for the corresponding call.

Nested calls to ‘llvm.call.preallocated.setup’ are allowed, but mustbe properly nested. e.g.

:: code-block:: llvm

%t1 = call token @llvm.call.preallocated.setup(i32 0)%t2 = call token @llvm.call.preallocated.setup(i32 0)call void foo() [“preallocated”(token %t2)]call void foo() [“preallocated”(token %t1)]

is allowed, but not

:: code-block:: llvm

%t1 = call token @llvm.call.preallocated.setup(i32 0)%t2 = call token @llvm.call.preallocated.setup(i32 0)call void foo() [“preallocated”(token %t1)]call void foo() [“preallocated”(token %t2)]

llvm.call.preallocated.arg’ Intrinsic

Syntax:
declareptr@llvm.call.preallocated.arg(token%setup_token,i32%arg_index)
Overview:

The ‘llvm.call.preallocated.arg’ intrinsic returns a pointer to thecorresponding preallocated argument for the preallocated call.

Semantics:

The ‘llvm.call.preallocated.arg’ intrinsic returns a pointer to the%arg_index``thargumentwiththe``preallocated attribute forthe call associated with the%setup_token, which must be from‘llvm.call.preallocated.setup’.

A call to ‘llvm.call.preallocated.arg’ must have a call sitepreallocated attribute. The type of thepreallocated attribute mustmatch the type used by thepreallocated attribute of the correspondingargument at the preallocated call. The type is used in the case that anllvm.call.preallocated.setup does not have a corresponding call (e.g. dueto DCE), where otherwise we cannot know how large the arguments are.

It is undefined behavior if this is called with a token from an‘llvm.call.preallocated.setup’ if another‘llvm.call.preallocated.setup’ has already been called or if thepreallocated call corresponding to the ‘llvm.call.preallocated.setup’has already been called.

llvm.call.preallocated.teardown’ Intrinsic

Syntax:
declareptr@llvm.call.preallocated.teardown(token%setup_token)
Overview:

The ‘llvm.call.preallocated.teardown’ intrinsic cleans up the stackcreated by a ‘llvm.call.preallocated.setup’.

Semantics:

The token argument must be a ‘llvm.call.preallocated.setup’.

The ‘llvm.call.preallocated.teardown’ intrinsic cleans up the stackallocated by the corresponding ‘llvm.call.preallocated.setup’. Exactlyone of this or the preallocated call must be called to prevent stack leaks.It is undefined behavior to call both a ‘llvm.call.preallocated.teardown’and the preallocated call for a given ‘llvm.call.preallocated.setup’.

For example, if the stack is allocated for a preallocated call by a‘llvm.call.preallocated.setup’, then an initializer function called on anallocated argument throws an exception, there should be a‘llvm.call.preallocated.teardown’ in the exception handler to preventstack leaks.

Following the nesting rules in ‘llvm.call.preallocated.setup’, nestedcalls to ‘llvm.call.preallocated.setup’ and‘llvm.call.preallocated.teardown’ are allowed but must be properlynested.

Example:
%cs=calltoken@llvm.call.preallocated.setup(i321)%x=callptr@llvm.call.preallocated.arg(token%cs,i320)preallocated(i32)invokevoid@constructor(ptr%x)tolabel%contaunwindlabel%contbconta:callvoid@foo1(ptrpreallocated(i32)%x)["preallocated"(token%cs)]retvoidcontb:%s=catchswitchwithinnone[label%catch]unwindtocallercatch:%p=catchpadwithin%s[]callvoid@llvm.call.preallocated.teardown(token%cs)retvoid

Standard C/C++ Library Intrinsics

LLVM provides intrinsics for a few important standard C/C++ libraryfunctions. These intrinsics allow source-language front-ends to passinformation about the alignment of the pointer arguments to the codegenerator, providing opportunity for more efficient code generation.

llvm.abs.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.abs on anyinteger bit width or any vector of integer elements.

declarei32@llvm.abs.i32(i32<src>,i1<is_int_min_poison>)declare<4xi32>@llvm.abs.v4i32(<4xi32><src>,i1<is_int_min_poison>)
Overview:

The ‘llvm.abs’ family of intrinsic functions returns the absolute valueof an argument.

Arguments:

The first argument is the value for which the absolute value is to be returned.This argument may be of any integer type or a vector with integer element type.The return type must match the first argument type.

The second argument must be a constant and is a flag to indicate whether theresult value of the ‘llvm.abs’ intrinsic is apoison value if the first argument is statically ordynamically anINT_MIN value.

Semantics:

The ‘llvm.abs’ intrinsic returns the magnitude (always positive) of thefirst argument or each element of a vector argument.”. If the first argument isINT_MIN, then the result is alsoINT_MIN ifis_int_min_poison==0andpoison otherwise.

llvm.smax.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can use@llvm.smax on anyinteger bit width or any vector of integer elements.

declarei32@llvm.smax.i32(i32%a,i32%b)declare<4xi32>@llvm.smax.v4i32(<4xi32>%a,<4xi32>%b)
Overview:

Return the larger of%a and%b comparing the values as signed integers.Vector intrinsics operate on a per-element basis. The larger element of%aand%b at a given index is returned for that index.

Arguments:

The arguments (%a and%b) may be of any integer type or a vector withinteger element type. The argument types must match each other, and the returntype must match the argument type.

llvm.smin.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can use@llvm.smin on anyinteger bit width or any vector of integer elements.

declarei32@llvm.smin.i32(i32%a,i32%b)declare<4xi32>@llvm.smin.v4i32(<4xi32>%a,<4xi32>%b)
Overview:

Return the smaller of%a and%b comparing the values as signed integers.Vector intrinsics operate on a per-element basis. The smaller element of%aand%b at a given index is returned for that index.

Arguments:

The arguments (%a and%b) may be of any integer type or a vector withinteger element type. The argument types must match each other, and the returntype must match the argument type.

llvm.umax.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can use@llvm.umax on anyinteger bit width or any vector of integer elements.

declarei32@llvm.umax.i32(i32%a,i32%b)declare<4xi32>@llvm.umax.v4i32(<4xi32>%a,<4xi32>%b)
Overview:

Return the larger of%a and%b comparing the values as unsignedintegers. Vector intrinsics operate on a per-element basis. The larger elementof%a and%b at a given index is returned for that index.

Arguments:

The arguments (%a and%b) may be of any integer type or a vector withinteger element type. The argument types must match each other, and the returntype must match the argument type.

llvm.umin.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can use@llvm.umin on anyinteger bit width or any vector of integer elements.

declarei32@llvm.umin.i32(i32%a,i32%b)declare<4xi32>@llvm.umin.v4i32(<4xi32>%a,<4xi32>%b)
Overview:

Return the smaller of%a and%b comparing the values as unsignedintegers. Vector intrinsics operate on a per-element basis. The smaller elementof%a and%b at a given index is returned for that index.

Arguments:

The arguments (%a and%b) may be of any integer type or a vector withinteger element type. The argument types must match each other, and the returntype must match the argument type.

llvm.scmp.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can use@llvm.scmp on anyinteger bit width or any vector of integer elements.

declarei2@llvm.scmp.i2.i32(i32%a,i32%b)declare<4xi32>@llvm.scmp.v4i32.v4i32(<4xi32>%a,<4xi32>%b)
Overview:

Return-1 if%a is signed less than%b,0 if they are equal, and1 if%a is signed greater than%b. Vector intrinsics operate on a per-element basis.

Arguments:

The arguments (%a and%b) may be of any integer type or a vector withinteger element type. The argument types must match each other, and the returntype must be at least as wide asi2, to hold the three possible return values.

llvm.ucmp.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can use@llvm.ucmp on anyinteger bit width or any vector of integer elements.

declarei2@llvm.ucmp.i2.i32(i32%a,i32%b)declare<4xi32>@llvm.ucmp.v4i32.v4i32(<4xi32>%a,<4xi32>%b)
Overview:

Return-1 if%a is unsigned less than%b,0 if they are equal, and1 if%a is unsigned greater than%b. Vector intrinsics operate on a per-element basis.

Arguments:

The arguments (%a and%b) may be of any integer type or a vector withinteger element type. The argument types must match each other, and the returntype must be at least as wide asi2, to hold the three possible return values.

llvm.memcpy’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.memcpy on anyinteger bit width and for different address spaces. Not all targetssupport all bit widths however.

declarevoid@llvm.memcpy.p0.p0.i32(ptr<dest>,ptr<src>,i32<len>,i1<isvolatile>)declarevoid@llvm.memcpy.p0.p0.i64(ptr<dest>,ptr<src>,i64<len>,i1<isvolatile>)
Overview:

The ‘llvm.memcpy.*’ intrinsics copy a block of memory from thesource location to the destination location.

Note that, unlike the standard libc function, thellvm.memcpy.*intrinsics do not return a value, takes extra isvolatilearguments and the pointers can be in specified address spaces.

Arguments:

The first argument is a pointer to the destination, the second is apointer to the source. The third argument is an integer argumentspecifying the number of bytes to copy, and the fourth is aboolean indicating a volatile access.

Thealign parameter attribute can be providedfor the first and second arguments.

If theisvolatile parameter istrue, thellvm.memcpy call isavolatile operation. The detailed access behavior is notvery cleanly specified and it is unwise to depend on it.

Semantics:

The ‘llvm.memcpy.*’ intrinsics copy a block of memory from the sourcelocation to the destination location, which must either be equal ornon-overlapping. It copies “len” bytes of memory over. If the argument is knownto be aligned to some boundary, this can be specified as an attribute on theargument.

If<len> is 0, it is no-op modulo the behavior of attributes attached tothe arguments.If<len> is not a well-defined value, the behavior is undefined.If<len> is not zero, both<dest> and<src> should be well-defined,otherwise the behavior is undefined.

llvm.memcpy.inline’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.memcpy.inline on anyinteger bit width and for different address spaces. Not all targetssupport all bit widths however.

declarevoid@llvm.memcpy.inline.p0.p0.i32(ptr<dest>,ptr<src>,i32<len>,i1<isvolatile>)declarevoid@llvm.memcpy.inline.p0.p0.i64(ptr<dest>,ptr<src>,i64<len>,i1<isvolatile>)
Overview:

The ‘llvm.memcpy.inline.*’ intrinsics copy a block of memory from thesource location to the destination location and guarantees that no externalfunctions are called.

Note that, unlike the standard libc function, thellvm.memcpy.inline.*intrinsics do not return a value, takes extra isvolatilearguments and the pointers can be in specified address spaces.

Arguments:

The first argument is a pointer to the destination, the second is apointer to the source. The third argument is an integer argumentspecifying the number of bytes to copy, and the fourth is aboolean indicating a volatile access.

Thealign parameter attribute can be providedfor the first and second arguments.

If theisvolatile parameter istrue, thellvm.memcpy.inline call isavolatile operation. The detailed access behavior is notvery cleanly specified and it is unwise to depend on it.

Semantics:

The ‘llvm.memcpy.inline.*’ intrinsics copy a block of memory from thesource location to the destination location, which are not allowed tooverlap. It copies “len” bytes of memory over. If the argument is knownto be aligned to some boundary, this can be specified as an attribute onthe argument.The behavior of ‘llvm.memcpy.inline.*’ is equivalent to the behavior of‘llvm.memcpy.*’, but the generated code is guaranteed not to call anyexternal functions.

llvm.memmove’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can use llvm.memmove on any integerbit width and for different address space. Not all targets support allbit widths however.

declarevoid@llvm.memmove.p0.p0.i32(ptr<dest>,ptr<src>,i32<len>,i1<isvolatile>)declarevoid@llvm.memmove.p0.p0.i64(ptr<dest>,ptr<src>,i64<len>,i1<isvolatile>)
Overview:

The ‘llvm.memmove.*’ intrinsics move a block of memory from thesource location to the destination location. It is similar to the‘llvm.memcpy’ intrinsic but allows the two memory locations tooverlap.

Note that, unlike the standard libc function, thellvm.memmove.*intrinsics do not return a value, takes an extra isvolatileargument and the pointers can be in specified address spaces.

Arguments:

The first argument is a pointer to the destination, the second is apointer to the source. The third argument is an integer argumentspecifying the number of bytes to copy, and the fourth is aboolean indicating a volatile access.

Thealign parameter attribute can be providedfor the first and second arguments.

If theisvolatile parameter istrue, thellvm.memmove callis avolatile operation. The detailed access behavior isnot very cleanly specified and it is unwise to depend on it.

Semantics:

The ‘llvm.memmove.*’ intrinsics copy a block of memory from thesource location to the destination location, which may overlap. Itcopies “len” bytes of memory over. If the argument is known to bealigned to some boundary, this can be specified as an attribute onthe argument.

If<len> is 0, it is no-op modulo the behavior of attributes attached tothe arguments.If<len> is not a well-defined value, the behavior is undefined.If<len> is not zero, both<dest> and<src> should be well-defined,otherwise the behavior is undefined.

llvm.memset.*’ Intrinsics

Syntax:

This is an overloaded intrinsic. You can use llvm.memset on any integerbit width and for different address spaces. However, not all targetssupport all bit widths.

declarevoid@llvm.memset.p0.i32(ptr<dest>,i8<val>,i32<len>,i1<isvolatile>)declarevoid@llvm.memset.p0.i64(ptr<dest>,i8<val>,i64<len>,i1<isvolatile>)
Overview:

The ‘llvm.memset.*’ intrinsics fill a block of memory with aparticular byte value.

Note that, unlike the standard libc function, thellvm.memsetintrinsic does not return a value and takes an extra volatileargument. Also, the destination can be in an arbitrary address space.

Arguments:

The first argument is a pointer to the destination to fill, the secondis the byte value with which to fill it, the third argument is aninteger argument specifying the number of bytes to fill, and the fourthis a boolean indicating a volatile access.

Thealign parameter attribute can be providedfor the first arguments.

If theisvolatile parameter istrue, thellvm.memset call isavolatile operation. The detailed access behavior is notvery cleanly specified and it is unwise to depend on it.

Semantics:

The ‘llvm.memset.*’ intrinsics fill “len” bytes of memory startingat the destination location. If the argument is known to bealigned to some boundary, this can be specified as an attribute onthe argument.

If<len> is 0, it is no-op modulo the behavior of attributes attached tothe arguments.If<len> is not a well-defined value, the behavior is undefined.If<len> is not zero,<dest> should be well-defined, otherwise thebehavior is undefined.

llvm.memset.inline’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.memset.inline on anyinteger bit width and for different address spaces. Not all targetssupport all bit widths however.

declarevoid@llvm.memset.inline.p0.p0i8.i32(ptr<dest>,i8<val>,i32<len>,i1<isvolatile>)declarevoid@llvm.memset.inline.p0.p0.i64(ptr<dest>,i8<val>,i64<len>,i1<isvolatile>)
Overview:

The ‘llvm.memset.inline.*’ intrinsics fill a block of memory with aparticular byte value and guarantees that no external functions are called.

Note that, unlike the standard libc function, thellvm.memset.inline.*intrinsics do not return a value, take an extra isvolatile argument and thepointer can be in specified address spaces.

Arguments:

The first argument is a pointer to the destination to fill, the secondis the byte value with which to fill it, the third argument is a constantinteger argument specifying the number of bytes to fill, and the fourthis a boolean indicating a volatile access.

Thealign parameter attribute can be providedfor the first argument.

If theisvolatile parameter istrue, thellvm.memset.inline call isavolatile operation. The detailed access behavior is notvery cleanly specified and it is unwise to depend on it.

Semantics:

The ‘llvm.memset.inline.*’ intrinsics fill “len” bytes of memory startingat the destination location. If the argument is known to bealigned to some boundary, this can be specified as an attribute onthe argument.

If<len> is 0, it is no-op modulo the behavior of attributes attached tothe arguments.If<len> is not a well-defined value, the behavior is undefined.If<len> is not zero,<dest> should be well-defined, otherwise thebehavior is undefined.

The behavior of ‘llvm.memset.inline.*’ is equivalent to the behavior of‘llvm.memset.*’, but the generated code is guaranteed not to call anyexternal functions.

llvm.experimental.memset.pattern’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.experimental.memset.pattern on any sized type and for differentaddress spaces.

declarevoid@llvm.experimental.memset.pattern.p0.i128.i64(ptr<dest>,i128<val>,i64<count>,i1<isvolatile>)
Overview:

The ‘llvm.experimental.memset.pattern.*’ intrinsics fill a block of memorywith a particular value. This may be expanded to an inline loop, a sequence ofstores, or a libcall depending on what is available for the target and theexpected performance and code size impact.

Arguments:

The first argument is a pointer to the destination to fill, the secondis the value with which to fill it, the third argument is an integerargument specifying the number of times to fill the value, and the fourth is aboolean indicating a volatile access.

Thealign parameter attribute can be providedfor the first argument.

If theisvolatile parameter istrue, thellvm.experimental.memset.pattern call is avolatile operation. The detailed access behavior is not very cleanly specified and itis unwise to depend on it.

Semantics:

The ‘llvm.experimental.memset.pattern*’ intrinsic fills memory starting atthe destination location with the given pattern<count> times,incrementing by the allocation size of the type each time. The stores followthe usual semantics of store instructions, including regarding endianness andpadding. If the argument is known to be aligned to some boundary, this can bespecified as an attribute on the argument.

If<count> is 0, it is no-op modulo the behavior of attributes attached tothe arguments.If<count> is not a well-defined value, the behavior is undefined.If<count> is not zero,<dest> should be well-defined, otherwise thebehavior is undefined.

llvm.sqrt.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.sqrt on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declarefloat@llvm.sqrt.f32(float%Val)declaredouble@llvm.sqrt.f64(double%Val)declarex86_fp80@llvm.sqrt.f80(x86_fp80%Val)declarefp128@llvm.sqrt.f128(fp128%Val)declareppc_fp128@llvm.sqrt.ppcf128(ppc_fp128%Val)
Overview:

The ‘llvm.sqrt’ intrinsics return the square root of the specified value.

Arguments:

The argument and return value are floating-point numbers of the same type.

Semantics:

Return the same value as a corresponding libm ‘sqrt’ function but withouttrapping or settingerrno. For types specified by IEEE-754, the resultmatches a conforming libm implementation.

When specified with the fast-math-flag ‘afn’, the result may be approximatedusing a less accurate calculation.

llvm.powi.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.powi on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

Generally, the only supported type for the exponent is the one matchingwith the C typeint.

declarefloat@llvm.powi.f32.i32(float%Val,i32%power)declaredouble@llvm.powi.f64.i16(double%Val,i16%power)declarex86_fp80@llvm.powi.f80.i32(x86_fp80%Val,i32%power)declarefp128@llvm.powi.f128.i32(fp128%Val,i32%power)declareppc_fp128@llvm.powi.ppcf128.i32(ppc_fp128%Val,i32%power)
Overview:

The ‘llvm.powi.*’ intrinsics return the first operand raised to thespecified (positive or negative) power. The order of evaluation ofmultiplications is not defined. When a vector of floating-point type isused, the second argument remains a scalar integer value.

Arguments:

The second argument is an integer power, and the first is a value toraise to that power.

Semantics:

This function returns the first value raised to the second power with anunspecified sequence of rounding operations.

llvm.sin.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.sin on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declarefloat@llvm.sin.f32(float%Val)declaredouble@llvm.sin.f64(double%Val)declarex86_fp80@llvm.sin.f80(x86_fp80%Val)declarefp128@llvm.sin.f128(fp128%Val)declareppc_fp128@llvm.sin.ppcf128(ppc_fp128%Val)
Overview:

The ‘llvm.sin.*’ intrinsics return the sine of the operand.

Arguments:

The argument and return value are floating-point numbers of the same type.

Semantics:

Return the same value as a corresponding libm ‘sin’ function but withouttrapping or settingerrno.

When specified with the fast-math-flag ‘afn’, the result may be approximatedusing a less accurate calculation.

llvm.cos.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.cos on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declarefloat@llvm.cos.f32(float%Val)declaredouble@llvm.cos.f64(double%Val)declarex86_fp80@llvm.cos.f80(x86_fp80%Val)declarefp128@llvm.cos.f128(fp128%Val)declareppc_fp128@llvm.cos.ppcf128(ppc_fp128%Val)
Overview:

The ‘llvm.cos.*’ intrinsics return the cosine of the operand.

Arguments:

The argument and return value are floating-point numbers of the same type.

Semantics:

Return the same value as a corresponding libm ‘cos’ function but withouttrapping or settingerrno.

When specified with the fast-math-flag ‘afn’, the result may be approximatedusing a less accurate calculation.

llvm.tan.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.tan on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declarefloat@llvm.tan.f32(float%Val)declaredouble@llvm.tan.f64(double%Val)declarex86_fp80@llvm.tan.f80(x86_fp80%Val)declarefp128@llvm.tan.f128(fp128%Val)declareppc_fp128@llvm.tan.ppcf128(ppc_fp128%Val)
Overview:

The ‘llvm.tan.*’ intrinsics return the tangent of the operand.

Arguments:

The argument and return value are floating-point numbers of the same type.

Semantics:

Return the same value as a corresponding libm ‘tan’ function but withouttrapping or settingerrno.

When specified with the fast-math-flag ‘afn’, the result may be approximatedusing a less accurate calculation.

llvm.asin.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.asin on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declarefloat@llvm.asin.f32(float%Val)declaredouble@llvm.asin.f64(double%Val)declarex86_fp80@llvm.asin.f80(x86_fp80%Val)declarefp128@llvm.asin.f128(fp128%Val)declareppc_fp128@llvm.asin.ppcf128(ppc_fp128%Val)
Overview:

The ‘llvm.asin.*’ intrinsics return the arcsine of the operand.

Arguments:

The argument and return value are floating-point numbers of the same type.

Semantics:

Return the same value as a corresponding libm ‘asin’ function but withouttrapping or settingerrno.

When specified with the fast-math-flag ‘afn’, the result may be approximatedusing a less accurate calculation.

llvm.acos.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.acos on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declarefloat@llvm.acos.f32(float%Val)declaredouble@llvm.acos.f64(double%Val)declarex86_fp80@llvm.acos.f80(x86_fp80%Val)declarefp128@llvm.acos.f128(fp128%Val)declareppc_fp128@llvm.acos.ppcf128(ppc_fp128%Val)
Overview:

The ‘llvm.acos.*’ intrinsics return the arccosine of the operand.

Arguments:

The argument and return value are floating-point numbers of the same type.

Semantics:

Return the same value as a corresponding libm ‘acos’ function but withouttrapping or settingerrno.

When specified with the fast-math-flag ‘afn’, the result may be approximatedusing a less accurate calculation.

llvm.atan.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.atan on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declarefloat@llvm.atan.f32(float%Val)declaredouble@llvm.atan.f64(double%Val)declarex86_fp80@llvm.atan.f80(x86_fp80%Val)declarefp128@llvm.atan.f128(fp128%Val)declareppc_fp128@llvm.atan.ppcf128(ppc_fp128%Val)
Overview:

The ‘llvm.atan.*’ intrinsics return the arctangent of the operand.

Arguments:

The argument and return value are floating-point numbers of the same type.

Semantics:

Return the same value as a corresponding libm ‘atan’ function but withouttrapping or settingerrno.

When specified with the fast-math-flag ‘afn’, the result may be approximatedusing a less accurate calculation.

llvm.atan2.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.atan2 on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declarefloat@llvm.atan2.f32(float%Y,float%X)declaredouble@llvm.atan2.f64(double%Y,double%X)declarex86_fp80@llvm.atan2.f80(x86_fp80%Y,x86_fp80%X)declarefp128@llvm.atan2.f128(fp128%Y,fp128%X)declareppc_fp128@llvm.atan2.ppcf128(ppc_fp128%Y,ppc_fp128%X)
Overview:

The ‘llvm.atan2.*’ intrinsics return the arctangent ofY/X accountingfor the quadrant.

Arguments:

The arguments and return value are floating-point numbers of the same type.

Semantics:

Return the same value as a corresponding libm ‘atan2’ function but withouttrapping or settingerrno.

When specified with the fast-math-flag ‘afn’, the result may be approximatedusing a less accurate calculation.

llvm.sinh.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.sinh on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declarefloat@llvm.sinh.f32(float%Val)declaredouble@llvm.sinh.f64(double%Val)declarex86_fp80@llvm.sinh.f80(x86_fp80%Val)declarefp128@llvm.sinh.f128(fp128%Val)declareppc_fp128@llvm.sinh.ppcf128(ppc_fp128%Val)
Overview:

The ‘llvm.sinh.*’ intrinsics return the hyperbolic sine of the operand.

Arguments:

The argument and return value are floating-point numbers of the same type.

Semantics:

Return the same value as a corresponding libm ‘sinh’ function but withouttrapping or settingerrno.

When specified with the fast-math-flag ‘afn’, the result may be approximatedusing a less accurate calculation.

llvm.cosh.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.cosh on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declarefloat@llvm.cosh.f32(float%Val)declaredouble@llvm.cosh.f64(double%Val)declarex86_fp80@llvm.cosh.f80(x86_fp80%Val)declarefp128@llvm.cosh.f128(fp128%Val)declareppc_fp128@llvm.cosh.ppcf128(ppc_fp128%Val)
Overview:

The ‘llvm.cosh.*’ intrinsics return the hyperbolic cosine of the operand.

Arguments:

The argument and return value are floating-point numbers of the same type.

Semantics:

Return the same value as a corresponding libm ‘cosh’ function but withouttrapping or settingerrno.

When specified with the fast-math-flag ‘afn’, the result may be approximatedusing a less accurate calculation.

llvm.tanh.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.tanh on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declarefloat@llvm.tanh.f32(float%Val)declaredouble@llvm.tanh.f64(double%Val)declarex86_fp80@llvm.tanh.f80(x86_fp80%Val)declarefp128@llvm.tanh.f128(fp128%Val)declareppc_fp128@llvm.tanh.ppcf128(ppc_fp128%Val)
Overview:

The ‘llvm.tanh.*’ intrinsics return the hyperbolic tangent of the operand.

Arguments:

The argument and return value are floating-point numbers of the same type.

Semantics:

Return the same value as a corresponding libm ‘tanh’ function but withouttrapping or settingerrno.

When specified with the fast-math-flag ‘afn’, the result may be approximatedusing a less accurate calculation.

llvm.sincos.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.sincos on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declare{float,float}@llvm.sincos.f32(float%Val)declare{double,double}@llvm.sincos.f64(double%Val)declare{x86_fp80,x86_fp80}@llvm.sincos.f80(x86_fp80%Val)declare{fp128,fp128}@llvm.sincos.f128(fp128%Val)declare{ppc_fp128,ppc_fp128}@llvm.sincos.ppcf128(ppc_fp128%Val)declare{<4xfloat>,<4xfloat>}@llvm.sincos.v4f32(<4xfloat>%Val)
Overview:

The ‘llvm.sincos.*’ intrinsics returns the sine and cosine of the operand.

Arguments:

The argument is afloating-point value orvector of floating-point values. Returns two values matchingthe argument type in a struct.

Semantics:

This intrinsic is equivalent to a calling bothllvm.sinandllvm.cos on the argument.

The first result is the sine of the argument and the second result is the cosineof the argument.

When specified with the fast-math-flag ‘afn’, the result may be approximatedusing a less accurate calculation.

llvm.sincospi.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.sincospi on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declare{float,float}@llvm.sincospi.f32(float%Val)declare{double,double}@llvm.sincospi.f64(double%Val)declare{x86_fp80,x86_fp80}@llvm.sincospi.f80(x86_fp80%Val)declare{fp128,fp128}@llvm.sincospi.f128(fp128%Val)declare{ppc_fp128,ppc_fp128}@llvm.sincospi.ppcf128(ppc_fp128%Val)declare{<4xfloat>,<4xfloat>}@llvm.sincospi.v4f32(<4xfloat>%Val)
Overview:

The ‘llvm.sincospi.*’ intrinsics returns the sine and cosine of pi*operand.

Arguments:

The argument is afloating-point value orvector of floating-point values. Returns two values matchingthe argument type in a struct.

Semantics:

This is equivalent to thellvm.sincos.* intrinsic where the argument has beenmultiplied by pi, however, it computes the result more accurately especiallyfor large input values.

Note

Currently, the default lowering of this intrinsic relies on thesincospi[f|l]functions being available in the target’s runtime (e.g. libc).

When specified with the fast-math-flag ‘afn’, the result may be approximatedusing a less accurate calculation.

llvm.modf.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.modf on any floating-pointor vector of floating-point type. However, not all targets support all types.

declare{float,float}@llvm.modf.f32(float%Val)declare{double,double}@llvm.modf.f64(double%Val)declare{x86_fp80,x86_fp80}@llvm.modf.f80(x86_fp80%Val)declare{fp128,fp128}@llvm.modf.f128(fp128%Val)declare{ppc_fp128,ppc_fp128}@llvm.modf.ppcf128(ppc_fp128%Val)declare{<4xfloat>,<4xfloat>}@llvm.modf.v4f32(<4xfloat>%Val)
Overview:

The ‘llvm.modf.*’ intrinsics return the operand’s integral and fractionalparts.

Arguments:

The argument is afloating-point value orvector of floating-point values. Returns two values matchingthe argument type in a struct.

Semantics:

Return the same values as a corresponding libm ‘modf’ function withouttrapping or settingerrno.

The first result is the fractional part of the operand and the second result isthe integral part of the operand. Both results have the same sign as the operand.

Not including exceptional inputs (listed below),llvm.modf.* is semanticallyequivalent to:

%fp=frem<fptype>%x,1.0;Fractionalpart%ip=fsub<fptype>%x,%fp;Integralpart

(assuming no floating-point precision errors)

If the argument is a zero, returns a zero with the same sign for both thefractional and integral parts.

If the argument is an infinity, returns a fractional part of zero with the samesign, and infinity with the same sign as the integral part.

When specified with the fast-math-flag ‘afn’, the result may be approximatedusing a less accurate calculation.

llvm.pow.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.pow on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declarefloat@llvm.pow.f32(float%Val,float%Power)declaredouble@llvm.pow.f64(double%Val,double%Power)declarex86_fp80@llvm.pow.f80(x86_fp80%Val,x86_fp80%Power)declarefp128@llvm.pow.f128(fp128%Val,fp128%Power)declareppc_fp128@llvm.pow.ppcf128(ppc_fp128%Val,ppc_fp128Power)
Overview:

The ‘llvm.pow.*’ intrinsics return the first operand raised to thespecified (positive or negative) power.

Arguments:

The arguments and return value are floating-point numbers of the same type.

Semantics:

Return the same value as a corresponding libm ‘pow’ function but withouttrapping or settingerrno.

When specified with the fast-math-flag ‘afn’, the result may be approximatedusing a less accurate calculation.

llvm.exp.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.exp on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declarefloat@llvm.exp.f32(float%Val)declaredouble@llvm.exp.f64(double%Val)declarex86_fp80@llvm.exp.f80(x86_fp80%Val)declarefp128@llvm.exp.f128(fp128%Val)declareppc_fp128@llvm.exp.ppcf128(ppc_fp128%Val)
Overview:

The ‘llvm.exp.*’ intrinsics compute the base-e exponential of the specifiedvalue.

Arguments:

The argument and return value are floating-point numbers of the same type.

Semantics:

Return the same value as a corresponding libm ‘exp’ function but withouttrapping or settingerrno.

When specified with the fast-math-flag ‘afn’, the result may be approximatedusing a less accurate calculation.

llvm.exp2.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.exp2 on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declarefloat@llvm.exp2.f32(float%Val)declaredouble@llvm.exp2.f64(double%Val)declarex86_fp80@llvm.exp2.f80(x86_fp80%Val)declarefp128@llvm.exp2.f128(fp128%Val)declareppc_fp128@llvm.exp2.ppcf128(ppc_fp128%Val)
Overview:

The ‘llvm.exp2.*’ intrinsics compute the base-2 exponential of thespecified value.

Arguments:

The argument and return value are floating-point numbers of the same type.

Semantics:

Return the same value as a corresponding libm ‘exp2’ function but withouttrapping or settingerrno.

When specified with the fast-math-flag ‘afn’, the result may be approximatedusing a less accurate calculation.

llvm.exp10.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.exp10 on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declarefloat@llvm.exp10.f32(float%Val)declaredouble@llvm.exp10.f64(double%Val)declarex86_fp80@llvm.exp10.f80(x86_fp80%Val)declarefp128@llvm.exp10.f128(fp128%Val)declareppc_fp128@llvm.exp10.ppcf128(ppc_fp128%Val)
Overview:

The ‘llvm.exp10.*’ intrinsics compute the base-10 exponential of thespecified value.

Arguments:

The argument and return value are floating-point numbers of the same type.

Semantics:

Return the same value as a corresponding libm ‘exp10’ function but withouttrapping or settingerrno.

When specified with the fast-math-flag ‘afn’, the result may be approximatedusing a less accurate calculation.

llvm.ldexp.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.ldexp on anyfloating point or vector of floating point type. Not all targets supportall types however.

declarefloat@llvm.ldexp.f32.i32(float%Val,i32%Exp)declaredouble@llvm.ldexp.f64.i32(double%Val,i32%Exp)declarex86_fp80@llvm.ldexp.f80.i32(x86_fp80%Val,i32%Exp)declarefp128@llvm.ldexp.f128.i32(fp128%Val,i32%Exp)declareppc_fp128@llvm.ldexp.ppcf128.i32(ppc_fp128%Val,i32%Exp)declare<2xfloat>@llvm.ldexp.v2f32.v2i32(<2xfloat>%Val,<2xi32>%Exp)
Overview:

The ‘llvm.ldexp.*’ intrinsics perform the ldexp function.

Arguments:

The first argument and the return value arefloating-point orvector of floating-point values ofthe same type. The second argument is an integer with the same numberof elements.

Semantics:

This function multiplies the first argument by 2 raised to the secondargument’s power. If the first argument is NaN or infinite, the samevalue is returned. If the result underflows a zero with the same signis returned. If the result overflows, the result is an infinity withthe same sign.

llvm.frexp.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.frexp on anyfloating point or vector of floating point type. Not all targets supportall types however.

declare{float,i32}@llvm.frexp.f32.i32(float%Val)declare{double,i32}@llvm.frexp.f64.i32(double%Val)declare{x86_fp80,i32}@llvm.frexp.f80.i32(x86_fp80%Val)declare{fp128,i32}@llvm.frexp.f128.i32(fp128%Val)declare{ppc_fp128,i32}@llvm.frexp.ppcf128.i32(ppc_fp128%Val)declare{<2xfloat>,<2xi32>}@llvm.frexp.v2f32.v2i32(<2xfloat>%Val)
Overview:

The ‘llvm.frexp.*’ intrinsics perform the frexp function.

Arguments:

The argument is afloating-point orvector of floating-point values. Returns two valuesin a struct. The first struct field matches the argument type, and thesecond field is an integer or a vector of integer values with the samenumber of elements as the argument.

Semantics:

This intrinsic splits a floating point value into a normalizedfractional component and integral exponent.

For a non-zero argument, returns the argument multiplied by some powerof two such that the absolute value of the returned value is in therange [0.5, 1.0), with the same sign as the argument. The secondresult is an integer such that the first result raised to the power ofthe second result is the input argument.

If the argument is a zero, returns a zero with the same sign and a 0exponent.

If the argument is a NaN, a NaN is returned and the returned exponentis unspecified.

If the argument is an infinity, returns an infinity with the same signand an unspecified exponent.

llvm.log.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.log on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declarefloat@llvm.log.f32(float%Val)declaredouble@llvm.log.f64(double%Val)declarex86_fp80@llvm.log.f80(x86_fp80%Val)declarefp128@llvm.log.f128(fp128%Val)declareppc_fp128@llvm.log.ppcf128(ppc_fp128%Val)
Overview:

The ‘llvm.log.*’ intrinsics compute the base-e logarithm of the specifiedvalue.

Arguments:

The argument and return value are floating-point numbers of the same type.

Semantics:

Return the same value as a corresponding libm ‘log’ function but withouttrapping or settingerrno.

When specified with the fast-math-flag ‘afn’, the result may be approximatedusing a less accurate calculation.

llvm.log10.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.log10 on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declarefloat@llvm.log10.f32(float%Val)declaredouble@llvm.log10.f64(double%Val)declarex86_fp80@llvm.log10.f80(x86_fp80%Val)declarefp128@llvm.log10.f128(fp128%Val)declareppc_fp128@llvm.log10.ppcf128(ppc_fp128%Val)
Overview:

The ‘llvm.log10.*’ intrinsics compute the base-10 logarithm of thespecified value.

Arguments:

The argument and return value are floating-point numbers of the same type.

Semantics:

Return the same value as a corresponding libm ‘log10’ function but withouttrapping or settingerrno.

When specified with the fast-math-flag ‘afn’, the result may be approximatedusing a less accurate calculation.

llvm.log2.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.log2 on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declarefloat@llvm.log2.f32(float%Val)declaredouble@llvm.log2.f64(double%Val)declarex86_fp80@llvm.log2.f80(x86_fp80%Val)declarefp128@llvm.log2.f128(fp128%Val)declareppc_fp128@llvm.log2.ppcf128(ppc_fp128%Val)
Overview:

The ‘llvm.log2.*’ intrinsics compute the base-2 logarithm of the specifiedvalue.

Arguments:

The argument and return value are floating-point numbers of the same type.

Semantics:

Return the same value as a corresponding libm ‘log2’ function but withouttrapping or settingerrno.

When specified with the fast-math-flag ‘afn’, the result may be approximatedusing a less accurate calculation.

llvm.fma.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.fma on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declarefloat@llvm.fma.f32(float%a,float%b,float%c)declaredouble@llvm.fma.f64(double%a,double%b,double%c)declarex86_fp80@llvm.fma.f80(x86_fp80%a,x86_fp80%b,x86_fp80%c)declarefp128@llvm.fma.f128(fp128%a,fp128%b,fp128%c)declareppc_fp128@llvm.fma.ppcf128(ppc_fp128%a,ppc_fp128%b,ppc_fp128%c)
Overview:

The ‘llvm.fma.*’ intrinsics perform the fused multiply-add operation.

Arguments:

The arguments and return value are floating-point numbers of the same type.

Semantics:

Return the same value as the IEEE-754 fusedMultiplyAdd operation. Thisis assumed to not trap or seterrno.

When specified with the fast-math-flag ‘afn’, the result may be approximatedusing a less accurate calculation.

llvm.fabs.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.fabs on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declarefloat@llvm.fabs.f32(float%Val)declaredouble@llvm.fabs.f64(double%Val)declarex86_fp80@llvm.fabs.f80(x86_fp80%Val)declarefp128@llvm.fabs.f128(fp128%Val)declareppc_fp128@llvm.fabs.ppcf128(ppc_fp128%Val)
Overview:

The ‘llvm.fabs.*’ intrinsics return the absolute value of theoperand.

Arguments:

The argument and return value are floating-point numbers of the sametype.

Semantics:

This function returns the same values as the libmfabs functionswould, and handles error conditions in the same way.The returned value is completely identical to the input except for the sign bit;in particular, if the input is a NaN, then the quiet/signaling bit and payloadare perfectly preserved.

llvm.min.*’ Intrinsics Comparation

Standard:

IEEE754 and ISO C define some min/max operations, and they have some differenceson working with qNaN/sNaN and +0.0/-0.0. Here is the list:

ISOC

fmin/fmax

fmininum/fmaximum

fminimum_num/fmaximum_num

IEEE754

minNum/maxNum (2008)

minimum/maximum (2019)

minimumNumber/maximumNumber (2019)

+0.0vs-0.0

either one

+0.0 > -0.0

+0.0 > -0.0

NUMvssNaN

qNaN, invalid exception

qNaN, invalid exception

NUM, invalid exception

qNaNvssNaN

qNaN, invalid exception

qNaN, invalid exception

qNaN, invalid exception

NUMvsqNaN

NUM, no exception

qNaN, no exception

NUM, no exception

LLVM Implementation:

LLVM implements all ISO C flavors as listed in this table, except in thedefault floating-point environment exceptions are ignored. The constrainedversions of the intrinsics respect the exception behavior.

Operation

minnum/maxnum

minimum/maximum

minimumnum/maximumnum

NUMvsqNaN

NUM, no exception

qNaN, no exception

NUM, no exception

NUMvssNaN

qNaN, invalid exception

qNaN, invalid exception

NUM, invalid exception

qNaNvssNaN

qNaN, invalid exception

qNaN, invalid exception

qNaN, invalid exception

sNaNvssNaN

qNaN, invalid exception

qNaN, invalid exception

qNaN, invalid exception

+0.0vs-0.0

+0.0(max)/-0.0(min)

+0.0(max)/-0.0(min)

+0.0(max)/-0.0(min)

NUMvsNUM

larger(max)/smaller(min)

larger(max)/smaller(min)

larger(max)/smaller(min)

llvm.minnum.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.minnum on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declarefloat@llvm.minnum.f32(float%Val0,float%Val1)declaredouble@llvm.minnum.f64(double%Val0,double%Val1)declarex86_fp80@llvm.minnum.f80(x86_fp80%Val0,x86_fp80%Val1)declarefp128@llvm.minnum.f128(fp128%Val0,fp128%Val1)declareppc_fp128@llvm.minnum.ppcf128(ppc_fp128%Val0,ppc_fp128%Val1)
Overview:

The ‘llvm.minnum.*’ intrinsics return the minimum of the twoarguments.

Arguments:

The arguments and return value are floating-point numbers of the sametype.

Semantics:

Follows the semantics of minNum in IEEE-754-2008, except that -0.0 < +0.0 for the purposesof this intrinsic. As for signaling NaNs, per the minNum semantics, if either operand is sNaN,the result is qNaN. This matches the recommended behavior for the libmfunctionfmin, although not all implementations have implemented these recommended behaviors.

If either operand is a qNaN, returns the other non-NaN operand. Returns NaN only if both operands areNaN or if either operand is sNaN. Note that arithmetic on an sNaN doesn’t consistently produce a qNaN,so arithmetic feeding into a minnum can produce inconsistent results. For example,minnum(fadd(sNaN,-0.0),1.0) can produce qNaN or 1.0 depending on whetherfadd is folded.

IEEE-754-2008 defines minNum, and it was removed in IEEE-754-2019. As the replacement, IEEE-754-2019definesminimumNumber.

If the intrinsic is marked with the nsz attribute, then the effect is as in the definition in Cand IEEE-754-2008: the result ofminnum(-0.0,+0.0) may be either -0.0 or +0.0.

Some architectures, such as ARMv8 (FMINNM), LoongArch (fmin), MIPSr6 (min.fmt), PowerPC/VSX (xsmindp),have instructions that match these semantics exactly; thus it is quite simple for these architectures.Some architectures have similar ones while they are not exact equivalent. Such as x86 implementsMINPS,which implements the semantics of C codea<b?a:b: NUM vs qNaN always return qNaN.MINPS can be usedifnsz andnnan are given.

For existing libc implementations, the behaviors of fmin may be quite different on sNaN and signed zero behaviors,even in the same release of a single libm implementation.

llvm.maxnum.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.maxnum on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declarefloat@llvm.maxnum.f32(float%Val0,float%Val1)declaredouble@llvm.maxnum.f64(double%Val0,double%Val1)declarex86_fp80@llvm.maxnum.f80(x86_fp80%Val0,x86_fp80%Val1)declarefp128@llvm.maxnum.f128(fp128%Val0,fp128%Val1)declareppc_fp128@llvm.maxnum.ppcf128(ppc_fp128%Val0,ppc_fp128%Val1)
Overview:

The ‘llvm.maxnum.*’ intrinsics return the maximum of the twoarguments.

Arguments:

The arguments and return value are floating-point numbers of the sametype.

Semantics:

Follows the semantics of maxNum in IEEE-754-2008, except that -0.0 < +0.0 for the purposesof this intrinsic. As for signaling NaNs, per the maxNum semantics, if either operand is sNaN,the result is qNaN. This matches the recommended behavior for the libmfunctionfmax, although not all implementations have implemented these recommended behaviors.

If either operand is a qNaN, returns the other non-NaN operand. Returns NaN only if both operands areNaN or if either operand is sNaN. Note that arithmetic on an sNaN doesn’t consistently produce a qNaN,so arithmetic feeding into a maxnum can produce inconsistent results. For example,maxnum(fadd(sNaN,-0.0),1.0) can produce qNaN or 1.0 depending on whetherfadd is folded.

IEEE-754-2008 defines maxNum, and it was removed in IEEE-754-2019. As the replacement, IEEE-754-2019definesmaximumNumber.

If the intrinsic is marked with the nsz attribute, then the effect is as in the definition in Cand IEEE-754-2008: the result of maxnum(-0.0, +0.0) may be either -0.0 or +0.0.

Some architectures, such as ARMv8 (FMAXNM), LoongArch (fmax), MIPSr6 (max.fmt), PowerPC/VSX (xsmaxdp),have instructions that match these semantics exactly; thus it is quite simple for these architectures.Some architectures have similar ones while they are not exact equivalent. Such as x86 implementsMAXPS,which implements the semantics of C codea>b?a:b: NUM vs qNaN always return qNaN.MAXPS can be usedifnsz andnnan are given.

For existing libc implementations, the behaviors of fmin may be quite different on sNaN and signed zero behaviors,even in the same release of a single libm implementation.

llvm.minimum.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.minimum on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declarefloat@llvm.minimum.f32(float%Val0,float%Val1)declaredouble@llvm.minimum.f64(double%Val0,double%Val1)declarex86_fp80@llvm.minimum.f80(x86_fp80%Val0,x86_fp80%Val1)declarefp128@llvm.minimum.f128(fp128%Val0,fp128%Val1)declareppc_fp128@llvm.minimum.ppcf128(ppc_fp128%Val0,ppc_fp128%Val1)
Overview:

The ‘llvm.minimum.*’ intrinsics return the minimum of the twoarguments, propagating NaNs and treating -0.0 as less than +0.0.

Arguments:

The arguments and return value are floating-point numbers of the sametype.

Semantics:

If either operand is a NaN, returns NaN. Otherwise returns the lesserof the two arguments. -0.0 is considered to be less than +0.0 for thisintrinsic. Note that these are the semantics specified in the draft ofIEEE 754-2019.

llvm.maximum.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.maximum on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declarefloat@llvm.maximum.f32(float%Val0,float%Val1)declaredouble@llvm.maximum.f64(double%Val0,double%Val1)declarex86_fp80@llvm.maximum.f80(x86_fp80%Val0,x86_fp80%Val1)declarefp128@llvm.maximum.f128(fp128%Val0,fp128%Val1)declareppc_fp128@llvm.maximum.ppcf128(ppc_fp128%Val0,ppc_fp128%Val1)
Overview:

The ‘llvm.maximum.*’ intrinsics return the maximum of the twoarguments, propagating NaNs and treating -0.0 as less than +0.0.

Arguments:

The arguments and return value are floating-point numbers of the sametype.

Semantics:

If either operand is a NaN, returns NaN. Otherwise returns the greaterof the two arguments. -0.0 is considered to be less than +0.0 for thisintrinsic. Note that these are the semantics specified in the draft ofIEEE 754-2019.

llvm.minimumnum.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.minimumnum on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declarefloat@llvm.minimumnum.f32(float%Val0,float%Val1)declaredouble@llvm.minimumnum.f64(double%Val0,double%Val1)declarex86_fp80@llvm.minimumnum.f80(x86_fp80%Val0,x86_fp80%Val1)declarefp128@llvm.minimumnum.f128(fp128%Val0,fp128%Val1)declareppc_fp128@llvm.minimumnum.ppcf128(ppc_fp128%Val0,ppc_fp128%Val1)
Overview:

The ‘llvm.minimumnum.*’ intrinsics return the minimum of the twoarguments, not propagating NaNs and treating -0.0 as less than +0.0.

Arguments:

The arguments and return value are floating-point numbers of the sametype.

Semantics:

If both operands are NaNs (including sNaN), returns aNaN. Ifone operand is NaN (including sNaN) and another operand is a number,return the number. Otherwise returns the lesser of the twoarguments. -0.0 is considered to be less than +0.0 for this intrinsic.

Note that these are the semantics of minimumNumber specified inIEEE-754-2019 with the usualsignaling NaN exception.

It has some differences with ‘llvm.minnum.*’:1)’llvm.minnum.*’ will return qNaN if either operand is sNaN.2)’llvm.minnum*’ may return either one if we compare +0.0 vs -0.0.

llvm.maximumnum.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.maximumnum on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declarefloat@llvm.maximumnum.f32(float%Val0,float%Val1)declaredouble@llvm.maximumnum.f64(double%Val0,double%Val1)declarex86_fp80@llvm.maximumnum.f80(x86_fp80%Val0,x86_fp80%Val1)declarefp128@llvm.maximumnum.f128(fp128%Val0,fp128%Val1)declareppc_fp128@llvm.maximumnum.ppcf128(ppc_fp128%Val0,ppc_fp128%Val1)
Overview:

The ‘llvm.maximumnum.*’ intrinsics return the maximum of the twoarguments, not propagating NaNs and treating -0.0 as less than +0.0.

Arguments:

The arguments and return value are floating-point numbers of the sametype.

Semantics:

If both operands are NaNs (including sNaN), returns aNaN. If one operand is NaN (including sNaN) andanother operand is a number, return the number. Otherwise returns thegreater of the two arguments. -0.0 is considered to be less than +0.0for this intrinsic.

Note that these are the semantics of maximumNumber specified inIEEE-754-2019 with the usualsignaling NaN exception.

It has some differences with ‘llvm.maxnum.*’:1)’llvm.maxnum.*’ will return qNaN if either operand is sNaN.2)’llvm.maxnum*’ may return either one if we compare +0.0 vs -0.0.

llvm.copysign.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.copysign on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declarefloat@llvm.copysign.f32(float%Mag,float%Sgn)declaredouble@llvm.copysign.f64(double%Mag,double%Sgn)declarex86_fp80@llvm.copysign.f80(x86_fp80%Mag,x86_fp80%Sgn)declarefp128@llvm.copysign.f128(fp128%Mag,fp128%Sgn)declareppc_fp128@llvm.copysign.ppcf128(ppc_fp128%Mag,ppc_fp128%Sgn)
Overview:

The ‘llvm.copysign.*’ intrinsics return a value with the magnitude of thefirst operand and the sign of the second operand.

Arguments:

The arguments and return value are floating-point numbers of the sametype.

Semantics:

This function returns the same values as the libmcopysignfunctions would, and handles error conditions in the same way.The returned value is completely identical to the first operand except for thesign bit; in particular, if the input is a NaN, then the quiet/signaling bit andpayload are perfectly preserved.

llvm.floor.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.floor on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declarefloat@llvm.floor.f32(float%Val)declaredouble@llvm.floor.f64(double%Val)declarex86_fp80@llvm.floor.f80(x86_fp80%Val)declarefp128@llvm.floor.f128(fp128%Val)declareppc_fp128@llvm.floor.ppcf128(ppc_fp128%Val)
Overview:

The ‘llvm.floor.*’ intrinsics return the floor of the operand.

Arguments:

The argument and return value are floating-point numbers of the sametype.

Semantics:

This function returns the same values as the libmfloor functionswould, and handles error conditions in the same way.

llvm.ceil.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.ceil on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declarefloat@llvm.ceil.f32(float%Val)declaredouble@llvm.ceil.f64(double%Val)declarex86_fp80@llvm.ceil.f80(x86_fp80%Val)declarefp128@llvm.ceil.f128(fp128%Val)declareppc_fp128@llvm.ceil.ppcf128(ppc_fp128%Val)
Overview:

The ‘llvm.ceil.*’ intrinsics return the ceiling of the operand.

Arguments:

The argument and return value are floating-point numbers of the sametype.

Semantics:

This function returns the same values as the libmceil functionswould, and handles error conditions in the same way.

llvm.trunc.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.trunc on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declarefloat@llvm.trunc.f32(float%Val)declaredouble@llvm.trunc.f64(double%Val)declarex86_fp80@llvm.trunc.f80(x86_fp80%Val)declarefp128@llvm.trunc.f128(fp128%Val)declareppc_fp128@llvm.trunc.ppcf128(ppc_fp128%Val)
Overview:

The ‘llvm.trunc.*’ intrinsics returns the operand rounded to thenearest integer not larger in magnitude than the operand.

Arguments:

The argument and return value are floating-point numbers of the sametype.

Semantics:

This function returns the same values as the libmtrunc functionswould, and handles error conditions in the same way.

llvm.rint.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.rint on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declarefloat@llvm.rint.f32(float%Val)declaredouble@llvm.rint.f64(double%Val)declarex86_fp80@llvm.rint.f80(x86_fp80%Val)declarefp128@llvm.rint.f128(fp128%Val)declareppc_fp128@llvm.rint.ppcf128(ppc_fp128%Val)
Overview:

The ‘llvm.rint.*’ intrinsics returns the operand rounded to thenearest integer. It may raise an inexact floating-point exception if theoperand isn’t an integer.

Arguments:

The argument and return value are floating-point numbers of the sametype.

Semantics:

This function returns the same values as the libmrint functionswould, and handles error conditions in the same way. Since LLVM assumes thedefault floating-point environment, the rounding mode isassumed to be set to “nearest”, so halfway cases are rounded to the eveninteger. UseConstrained Floating-Point Intrinsicsto avoid that assumption.

llvm.nearbyint.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.nearbyint on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declarefloat@llvm.nearbyint.f32(float%Val)declaredouble@llvm.nearbyint.f64(double%Val)declarex86_fp80@llvm.nearbyint.f80(x86_fp80%Val)declarefp128@llvm.nearbyint.f128(fp128%Val)declareppc_fp128@llvm.nearbyint.ppcf128(ppc_fp128%Val)
Overview:

The ‘llvm.nearbyint.*’ intrinsics returns the operand rounded to thenearest integer.

Arguments:

The argument and return value are floating-point numbers of the sametype.

Semantics:

This function returns the same values as the libmnearbyintfunctions would, and handles error conditions in the same way. Since LLVMassumes thedefault floating-point environment, the roundingmode is assumed to be set to “nearest”, so halfway cases are rounded to the eveninteger. UseConstrained Floating-Point Intrinsics toavoid that assumption.

llvm.round.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.round on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declarefloat@llvm.round.f32(float%Val)declaredouble@llvm.round.f64(double%Val)declarex86_fp80@llvm.round.f80(x86_fp80%Val)declarefp128@llvm.round.f128(fp128%Val)declareppc_fp128@llvm.round.ppcf128(ppc_fp128%Val)
Overview:

The ‘llvm.round.*’ intrinsics returns the operand rounded to thenearest integer.

Arguments:

The argument and return value are floating-point numbers of the sametype.

Semantics:

This function returns the same values as the libmroundfunctions would, and handles error conditions in the same way.

llvm.roundeven.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.roundeven on anyfloating-point or vector of floating-point type. Not all targets supportall types however.

declarefloat@llvm.roundeven.f32(float%Val)declaredouble@llvm.roundeven.f64(double%Val)declarex86_fp80@llvm.roundeven.f80(x86_fp80%Val)declarefp128@llvm.roundeven.f128(fp128%Val)declareppc_fp128@llvm.roundeven.ppcf128(ppc_fp128%Val)
Overview:

The ‘llvm.roundeven.*’ intrinsics returns the operand rounded to the nearestinteger in floating-point format rounding halfway cases to even (that is, to thenearest value that is an even integer).

Arguments:

The argument and return value are floating-point numbers of the same type.

Semantics:

This function implements IEEE-754 operationroundToIntegralTiesToEven. Italso behaves in the same way as C standard functionroundeven, includingthat it disregards rounding mode and does not raise floating point exceptions.

llvm.lround.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.lround on anyfloating-point type or vector of floating-point type. Not all targetssupport all types however.

declarei32@llvm.lround.i32.f32(float%Val)declarei32@llvm.lround.i32.f64(double%Val)declarei32@llvm.lround.i32.f80(float%Val)declarei32@llvm.lround.i32.f128(double%Val)declarei32@llvm.lround.i32.ppcf128(double%Val)declarei64@llvm.lround.i64.f32(float%Val)declarei64@llvm.lround.i64.f64(double%Val)declarei64@llvm.lround.i64.f80(float%Val)declarei64@llvm.lround.i64.f128(double%Val)declarei64@llvm.lround.i64.ppcf128(double%Val)
Overview:

The ‘llvm.lround.*’ intrinsics return the operand rounded to the nearestinteger with ties away from zero.

Arguments:

The argument is a floating-point number and the return value is an integertype.

Semantics:

This function returns the same values as the libmlround functionswould, but without setting errno. If the rounded value is too large tobe stored in the result type, the return value is a non-deterministicvalue (equivalent tofreeze poison).

llvm.llround.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.llround on anyfloating-point type. Not all targets support all types however.

declarei64@llvm.llround.i64.f32(float%Val)declarei64@llvm.llround.i64.f64(double%Val)declarei64@llvm.llround.i64.f80(float%Val)declarei64@llvm.llround.i64.f128(double%Val)declarei64@llvm.llround.i64.ppcf128(double%Val)
Overview:

The ‘llvm.llround.*’ intrinsics return the operand rounded to the nearestinteger with ties away from zero.

Arguments:

The argument is a floating-point number and the return value is an integertype.

Semantics:

This function returns the same values as the libmllroundfunctions would, but without setting errno. If the rounded value istoo large to be stored in the result type, the return value is anon-deterministic value (equivalent tofreeze poison).

llvm.lrint.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.lrint on anyfloating-point type or vector of floating-point type. Not all targetssupport all types however.

declarei32@llvm.lrint.i32.f32(float%Val)declarei32@llvm.lrint.i32.f64(double%Val)declarei32@llvm.lrint.i32.f80(float%Val)declarei32@llvm.lrint.i32.f128(double%Val)declarei32@llvm.lrint.i32.ppcf128(double%Val)declarei64@llvm.lrint.i64.f32(float%Val)declarei64@llvm.lrint.i64.f64(double%Val)declarei64@llvm.lrint.i64.f80(float%Val)declarei64@llvm.lrint.i64.f128(double%Val)declarei64@llvm.lrint.i64.ppcf128(double%Val)
Overview:

The ‘llvm.lrint.*’ intrinsics return the operand rounded to the nearestinteger.

Arguments:

The argument is a floating-point number and the return value is an integertype.

Semantics:

This function returns the same values as the libmlrint functionswould, but without setting errno. If the rounded value is too large tobe stored in the result type, the return value is a non-deterministicvalue (equivalent tofreeze poison).

llvm.llrint.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.llrint on anyfloating-point type or vector of floating-point type. Not all targetssupport all types however.

declarei64@llvm.llrint.i64.f32(float%Val)declarei64@llvm.llrint.i64.f64(double%Val)declarei64@llvm.llrint.i64.f80(float%Val)declarei64@llvm.llrint.i64.f128(double%Val)declarei64@llvm.llrint.i64.ppcf128(double%Val)
Overview:

The ‘llvm.llrint.*’ intrinsics return the operand rounded to the nearestinteger.

Arguments:

The argument is a floating-point number and the return value is an integertype.

Semantics:

This function returns the same values as the libmllrint functionswould, but without setting errno. If the rounded value is too large tobe stored in the result type, the return value is a non-deterministicvalue (equivalent tofreeze poison).

Bit Manipulation Intrinsics

LLVM provides intrinsics for a few important bit manipulationoperations. These allow efficient code generation for some algorithms.

llvm.bitreverse.*’ Intrinsics

Syntax:

This is an overloaded intrinsic function. You can use bitreverse on anyinteger type.

declarei16@llvm.bitreverse.i16(i16<id>)declarei32@llvm.bitreverse.i32(i32<id>)declarei64@llvm.bitreverse.i64(i64<id>)declare<4xi32>@llvm.bitreverse.v4i32(<4xi32><id>)
Overview:

The ‘llvm.bitreverse’ family of intrinsics is used to reverse thebitpattern of an integer value or vector of integer values; for example0b10110110 becomes0b01101101.

Semantics:

Thellvm.bitreverse.iN intrinsic returns an iN value that has bitM in the input moved to bitN-M-1 in the output. The vectorintrinsics, such asllvm.bitreverse.v4i32, operate on a per-elementbasis and the element order is not affected.

llvm.bswap.*’ Intrinsics

Syntax:

This is an overloaded intrinsic function. You can use bswap on anyinteger type that is an even number of bytes (i.e. BitWidth % 16 == 0).

declarei16@llvm.bswap.i16(i16<id>)declarei32@llvm.bswap.i32(i32<id>)declarei64@llvm.bswap.i64(i64<id>)declare<4xi32>@llvm.bswap.v4i32(<4xi32><id>)
Overview:

The ‘llvm.bswap’ family of intrinsics is used to byte swap an integervalue or vector of integer values with an even number of bytes (positivemultiple of 16 bits).

Semantics:

Thellvm.bswap.i16 intrinsic returns an i16 value that has the highand low byte of the input i16 swapped. Similarly, thellvm.bswap.i32intrinsic returns an i32 value that has the four bytes of the input i32swapped, so that if the input bytes are numbered 0, 1, 2, 3 then thereturned i32 will have its bytes in 3, 2, 1, 0 order. Thellvm.bswap.i48,llvm.bswap.i64 and other intrinsics extend thisconcept to additional even-byte lengths (6 bytes, 8 bytes and more,respectively). The vector intrinsics, such asllvm.bswap.v4i32,operate on a per-element basis and the element order is not affected.

llvm.ctpop.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can use llvm.ctpop on any integerbit width, or on any vector with integer elements. Not all targetssupport all bit widths or vector types, however.

declarei8@llvm.ctpop.i8(i8<src>)declarei16@llvm.ctpop.i16(i16<src>)declarei32@llvm.ctpop.i32(i32<src>)declarei64@llvm.ctpop.i64(i64<src>)declarei256@llvm.ctpop.i256(i256<src>)declare<2xi32>@llvm.ctpop.v2i32(<2xi32><src>)
Overview:

The ‘llvm.ctpop’ family of intrinsics counts the number of bits setin a value.

Arguments:

The only argument is the value to be counted. The argument may be of anyinteger type, or a vector with integer elements. The return type mustmatch the argument type.

Semantics:

The ‘llvm.ctpop’ intrinsic counts the 1’s in a variable, or withineach element of a vector.

llvm.ctlz.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.ctlz on anyinteger bit width, or any vector whose elements are integers. Not alltargets support all bit widths or vector types, however.

declarei8@llvm.ctlz.i8(i8<src>,i1<is_zero_poison>)declare<2xi37>@llvm.ctlz.v2i37(<2xi37><src>,i1<is_zero_poison>)
Overview:

The ‘llvm.ctlz’ family of intrinsic functions counts the number ofleading zeros in a variable.

Arguments:

The first argument is the value to be counted. This argument may be ofany integer type, or a vector with integer element type. The returntype must match the first argument type.

The second argument is a constant flag that indicates whether the intrinsicreturns a valid result if the first argument is zero. If the firstargument is zero and the second argument is true, the result is poison.Historically some architectures did not provide a defined result for zerovalues as efficiently, and many algorithms are now predicated on avoidingzero-value inputs.

Semantics:

The ‘llvm.ctlz’ intrinsic counts the leading (most significant)zeros in a variable, or within each element of the vector. Ifsrc==0 then the result is the size in bits of the type ofsrcifis_zero_poison==0 andpoison otherwise. For example,llvm.ctlz(i322)=30.

llvm.cttz.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.cttz on anyinteger bit width, or any vector of integer elements. Not all targetssupport all bit widths or vector types, however.

declarei42@llvm.cttz.i42(i42<src>,i1<is_zero_poison>)declare<2xi32>@llvm.cttz.v2i32(<2xi32><src>,i1<is_zero_poison>)
Overview:

The ‘llvm.cttz’ family of intrinsic functions counts the number oftrailing zeros.

Arguments:

The first argument is the value to be counted. This argument may be ofany integer type, or a vector with integer element type. The returntype must match the first argument type.

The second argument is a constant flag that indicates whether the intrinsicreturns a valid result if the first argument is zero. If the firstargument is zero and the second argument is true, the result is poison.Historically some architectures did not provide a defined result for zerovalues as efficiently, and many algorithms are now predicated on avoidingzero-value inputs.

Semantics:

The ‘llvm.cttz’ intrinsic counts the trailing (least significant)zeros in a variable, or within each element of a vector. Ifsrc==0then the result is the size in bits of the type ofsrc ifis_zero_poison==0 andpoison otherwise. For example,llvm.cttz(2)=1.

llvm.fshl.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.fshl on anyinteger bit width or any vector of integer elements. Not all targetssupport all bit widths or vector types, however.

declarei8@llvm.fshl.i8(i8%a,i8%b,i8%c)declarei64@llvm.fshl.i64(i64%a,i64%b,i64%c)declare<2xi32>@llvm.fshl.v2i32(<2xi32>%a,<2xi32>%b,<2xi32>%c)
Overview:

The ‘llvm.fshl’ family of intrinsic functions performs a funnel shift left:the first two values are concatenated as { %a : %b } (%a is the most significantbits of the wide value), the combined value is shifted left, and the mostsignificant bits are extracted to produce a result that is the same size as theoriginal arguments. If the first 2 arguments are identical, this is equivalentto a rotate left operation. For vector types, the operation occurs for eachelement of the vector. The shift argument is treated as an unsigned amountmodulo the element size of the arguments.

Arguments:

The first two arguments are the values to be concatenated. The thirdargument is the shift amount. The arguments may be any integer type or avector with integer element type. All arguments and the return value musthave the same type.

Example:
%r = call i8 @llvm.fshl.i8(i8 %x, i8 %y, i8 %z)  ; %r = i8: msb_extract((concat(x, y) << (z % 8)), 8)%r = call i8 @llvm.fshl.i8(i8 255, i8 0, i8 15)  ; %r = i8: 128 (0b10000000)%r = call i8 @llvm.fshl.i8(i8 15, i8 15, i8 11)  ; %r = i8: 120 (0b01111000)%r = call i8 @llvm.fshl.i8(i8 0, i8 255, i8 8)   ; %r = i8: 0   (0b00000000)

llvm.fshr.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.fshr on anyinteger bit width or any vector of integer elements. Not all targetssupport all bit widths or vector types, however.

declarei8@llvm.fshr.i8(i8%a,i8%b,i8%c)declarei64@llvm.fshr.i64(i64%a,i64%b,i64%c)declare<2xi32>@llvm.fshr.v2i32(<2xi32>%a,<2xi32>%b,<2xi32>%c)
Overview:

The ‘llvm.fshr’ family of intrinsic functions performs a funnel shift right:the first two values are concatenated as { %a : %b } (%a is the most significantbits of the wide value), the combined value is shifted right, and the leastsignificant bits are extracted to produce a result that is the same size as theoriginal arguments. If the first 2 arguments are identical, this is equivalentto a rotate right operation. For vector types, the operation occurs for eachelement of the vector. The shift argument is treated as an unsigned amountmodulo the element size of the arguments.

Arguments:

The first two arguments are the values to be concatenated. The thirdargument is the shift amount. The arguments may be any integer type or avector with integer element type. All arguments and the return value musthave the same type.

Example:
%r = call i8 @llvm.fshr.i8(i8 %x, i8 %y, i8 %z)  ; %r = i8: lsb_extract((concat(x, y) >> (z % 8)), 8)%r = call i8 @llvm.fshr.i8(i8 255, i8 0, i8 15)  ; %r = i8: 254 (0b11111110)%r = call i8 @llvm.fshr.i8(i8 15, i8 15, i8 11)  ; %r = i8: 225 (0b11100001)%r = call i8 @llvm.fshr.i8(i8 0, i8 255, i8 8)   ; %r = i8: 255 (0b11111111)

Arithmetic with Overflow Intrinsics

LLVM provides intrinsics for fast arithmetic overflow checking.

Each of these intrinsics returns a two-element struct. The firstelement of this struct contains the result of the correspondingarithmetic operation modulo 2n, where n is the bit width ofthe result. Therefore, for example, the first element of the structreturned byllvm.sadd.with.overflow.i32 is always the same as theresult of a 32-bitadd instruction with the same operands, wheretheadd isnot modified by annsw ornuw flag.

The second element of the result is ani1 that is 1 if thearithmetic operation overflowed and 0 otherwise. An operationoverflows if, for any values of its operandsA andB and foranyN larger than the operands’ width,ext(AopB)toiN isnot equal to(ext(A)toiN)op(ext(B)toiN) whereext issext for signed overflow andzext for unsigned overflow, andop is the underlying arithmetic operation.

The behavior of these intrinsics is well-defined for all argumentvalues.

llvm.sadd.with.overflow.*’ Intrinsics

Syntax:

This is an overloaded intrinsic. You can usellvm.sadd.with.overflowon any integer bit width or vectors of integers.

declare{i16,i1}@llvm.sadd.with.overflow.i16(i16%a,i16%b)declare{i32,i1}@llvm.sadd.with.overflow.i32(i32%a,i32%b)declare{i64,i1}@llvm.sadd.with.overflow.i64(i64%a,i64%b)declare{<4xi32>,<4xi1>}@llvm.sadd.with.overflow.v4i32(<4xi32>%a,<4xi32>%b)
Overview:

The ‘llvm.sadd.with.overflow’ family of intrinsic functions performa signed addition of the two arguments, and indicate whether an overflowoccurred during the signed summation.

Arguments:

The arguments (%a and %b) and the first element of the result structuremay be of integer types of any bit width, but they must have the samebit width. The second element of the result structure must be of typei1.%a and%b are the two values that will undergo signedaddition.

Semantics:

The ‘llvm.sadd.with.overflow’ family of intrinsic functions performa signed addition of the two variables. They return a structure — thefirst element of which is the signed summation, and the second elementof which is a bit specifying if the signed summation resulted in anoverflow.

Examples:
%res=call{i32,i1}@llvm.sadd.with.overflow.i32(i32%a,i32%b)%sum=extractvalue{i32,i1}%res,0%obit=extractvalue{i32,i1}%res,1bri1%obit,label%overflow,label%normal

llvm.uadd.with.overflow.*’ Intrinsics

Syntax:

This is an overloaded intrinsic. You can usellvm.uadd.with.overflowon any integer bit width or vectors of integers.

declare{i16,i1}@llvm.uadd.with.overflow.i16(i16%a,i16%b)declare{i32,i1}@llvm.uadd.with.overflow.i32(i32%a,i32%b)declare{i64,i1}@llvm.uadd.with.overflow.i64(i64%a,i64%b)declare{<4xi32>,<4xi1>}@llvm.uadd.with.overflow.v4i32(<4xi32>%a,<4xi32>%b)
Overview:

The ‘llvm.uadd.with.overflow’ family of intrinsic functions performan unsigned addition of the two arguments, and indicate whether a carryoccurred during the unsigned summation.

Arguments:

The arguments (%a and %b) and the first element of the result structuremay be of integer types of any bit width, but they must have the samebit width. The second element of the result structure must be of typei1.%a and%b are the two values that will undergo unsignedaddition.

Semantics:

The ‘llvm.uadd.with.overflow’ family of intrinsic functions performan unsigned addition of the two arguments. They return a structure — thefirst element of which is the sum, and the second element of which is abit specifying if the unsigned summation resulted in a carry.

Examples:
%res=call{i32,i1}@llvm.uadd.with.overflow.i32(i32%a,i32%b)%sum=extractvalue{i32,i1}%res,0%obit=extractvalue{i32,i1}%res,1bri1%obit,label%carry,label%normal

llvm.ssub.with.overflow.*’ Intrinsics

Syntax:

This is an overloaded intrinsic. You can usellvm.ssub.with.overflowon any integer bit width or vectors of integers.

declare{i16,i1}@llvm.ssub.with.overflow.i16(i16%a,i16%b)declare{i32,i1}@llvm.ssub.with.overflow.i32(i32%a,i32%b)declare{i64,i1}@llvm.ssub.with.overflow.i64(i64%a,i64%b)declare{<4xi32>,<4xi1>}@llvm.ssub.with.overflow.v4i32(<4xi32>%a,<4xi32>%b)
Overview:

The ‘llvm.ssub.with.overflow’ family of intrinsic functions performa signed subtraction of the two arguments, and indicate whether anoverflow occurred during the signed subtraction.

Arguments:

The arguments (%a and %b) and the first element of the result structuremay be of integer types of any bit width, but they must have the samebit width. The second element of the result structure must be of typei1.%a and%b are the two values that will undergo signedsubtraction.

Semantics:

The ‘llvm.ssub.with.overflow’ family of intrinsic functions performa signed subtraction of the two arguments. They return a structure — thefirst element of which is the subtraction, and the second element ofwhich is a bit specifying if the signed subtraction resulted in anoverflow.

Examples:
%res=call{i32,i1}@llvm.ssub.with.overflow.i32(i32%a,i32%b)%sum=extractvalue{i32,i1}%res,0%obit=extractvalue{i32,i1}%res,1bri1%obit,label%overflow,label%normal

llvm.usub.with.overflow.*’ Intrinsics

Syntax:

This is an overloaded intrinsic. You can usellvm.usub.with.overflowon any integer bit width or vectors of integers.

declare{i16,i1}@llvm.usub.with.overflow.i16(i16%a,i16%b)declare{i32,i1}@llvm.usub.with.overflow.i32(i32%a,i32%b)declare{i64,i1}@llvm.usub.with.overflow.i64(i64%a,i64%b)declare{<4xi32>,<4xi1>}@llvm.usub.with.overflow.v4i32(<4xi32>%a,<4xi32>%b)
Overview:

The ‘llvm.usub.with.overflow’ family of intrinsic functions performan unsigned subtraction of the two arguments, and indicate whether anoverflow occurred during the unsigned subtraction.

Arguments:

The arguments (%a and %b) and the first element of the result structuremay be of integer types of any bit width, but they must have the samebit width. The second element of the result structure must be of typei1.%a and%b are the two values that will undergo unsignedsubtraction.

Semantics:

The ‘llvm.usub.with.overflow’ family of intrinsic functions performan unsigned subtraction of the two arguments. They return a structure —the first element of which is the subtraction, and the second element ofwhich is a bit specifying if the unsigned subtraction resulted in anoverflow.

Examples:
%res=call{i32,i1}@llvm.usub.with.overflow.i32(i32%a,i32%b)%sum=extractvalue{i32,i1}%res,0%obit=extractvalue{i32,i1}%res,1bri1%obit,label%overflow,label%normal

llvm.smul.with.overflow.*’ Intrinsics

Syntax:

This is an overloaded intrinsic. You can usellvm.smul.with.overflowon any integer bit width or vectors of integers.

declare{i16,i1}@llvm.smul.with.overflow.i16(i16%a,i16%b)declare{i32,i1}@llvm.smul.with.overflow.i32(i32%a,i32%b)declare{i64,i1}@llvm.smul.with.overflow.i64(i64%a,i64%b)declare{<4xi32>,<4xi1>}@llvm.smul.with.overflow.v4i32(<4xi32>%a,<4xi32>%b)
Overview:

The ‘llvm.smul.with.overflow’ family of intrinsic functions performa signed multiplication of the two arguments, and indicate whether anoverflow occurred during the signed multiplication.

Arguments:

The arguments (%a and %b) and the first element of the result structuremay be of integer types of any bit width, but they must have the samebit width. The second element of the result structure must be of typei1.%a and%b are the two values that will undergo signedmultiplication.

Semantics:

The ‘llvm.smul.with.overflow’ family of intrinsic functions performa signed multiplication of the two arguments. They return a structure —the first element of which is the multiplication, and the second elementof which is a bit specifying if the signed multiplication resulted in anoverflow.

Examples:
%res=call{i32,i1}@llvm.smul.with.overflow.i32(i32%a,i32%b)%sum=extractvalue{i32,i1}%res,0%obit=extractvalue{i32,i1}%res,1bri1%obit,label%overflow,label%normal

llvm.umul.with.overflow.*’ Intrinsics

Syntax:

This is an overloaded intrinsic. You can usellvm.umul.with.overflowon any integer bit width or vectors of integers.

declare{i16,i1}@llvm.umul.with.overflow.i16(i16%a,i16%b)declare{i32,i1}@llvm.umul.with.overflow.i32(i32%a,i32%b)declare{i64,i1}@llvm.umul.with.overflow.i64(i64%a,i64%b)declare{<4xi32>,<4xi1>}@llvm.umul.with.overflow.v4i32(<4xi32>%a,<4xi32>%b)
Overview:

The ‘llvm.umul.with.overflow’ family of intrinsic functions performa unsigned multiplication of the two arguments, and indicate whether anoverflow occurred during the unsigned multiplication.

Arguments:

The arguments (%a and %b) and the first element of the result structuremay be of integer types of any bit width, but they must have the samebit width. The second element of the result structure must be of typei1.%a and%b are the two values that will undergo unsignedmultiplication.

Semantics:

The ‘llvm.umul.with.overflow’ family of intrinsic functions performan unsigned multiplication of the two arguments. They return a structure —the first element of which is the multiplication, and the secondelement of which is a bit specifying if the unsigned multiplicationresulted in an overflow.

Examples:
%res=call{i32,i1}@llvm.umul.with.overflow.i32(i32%a,i32%b)%sum=extractvalue{i32,i1}%res,0%obit=extractvalue{i32,i1}%res,1bri1%obit,label%overflow,label%normal

Saturation Arithmetic Intrinsics

Saturation arithmetic is a version of arithmetic in which operations arelimited to a fixed range between a minimum and maximum value. If the result ofan operation is greater than the maximum value, the result is set (or“clamped”) to this maximum. If it is below the minimum, it is clamped to thisminimum.

llvm.sadd.sat.*’ Intrinsics

Syntax

This is an overloaded intrinsic. You can usellvm.sadd.saton any integer bit width or vectors of integers.

declarei16@llvm.sadd.sat.i16(i16%a,i16%b)declarei32@llvm.sadd.sat.i32(i32%a,i32%b)declarei64@llvm.sadd.sat.i64(i64%a,i64%b)declare<4xi32>@llvm.sadd.sat.v4i32(<4xi32>%a,<4xi32>%b)
Overview

The ‘llvm.sadd.sat’ family of intrinsic functions perform signedsaturating addition on the 2 arguments.

Arguments

The arguments (%a and %b) and the result may be of integer types of any bitwidth, but they must have the same bit width.%a and%b are the twovalues that will undergo signed addition.

Semantics:

The maximum value this operation can clamp to is the largest signed valuerepresentable by the bit width of the arguments. The minimum value is thesmallest signed value representable by this bit width.

Examples
%res=calli4@llvm.sadd.sat.i4(i41,i42); %res = 3%res=calli4@llvm.sadd.sat.i4(i45,i46); %res = 7%res=calli4@llvm.sadd.sat.i4(i4-4,i42); %res = -2%res=calli4@llvm.sadd.sat.i4(i4-4,i4-5); %res = -8

llvm.uadd.sat.*’ Intrinsics

Syntax

This is an overloaded intrinsic. You can usellvm.uadd.saton any integer bit width or vectors of integers.

declarei16@llvm.uadd.sat.i16(i16%a,i16%b)declarei32@llvm.uadd.sat.i32(i32%a,i32%b)declarei64@llvm.uadd.sat.i64(i64%a,i64%b)declare<4xi32>@llvm.uadd.sat.v4i32(<4xi32>%a,<4xi32>%b)
Overview

The ‘llvm.uadd.sat’ family of intrinsic functions perform unsignedsaturating addition on the 2 arguments.

Arguments

The arguments (%a and %b) and the result may be of integer types of any bitwidth, but they must have the same bit width.%a and%b are the twovalues that will undergo unsigned addition.

Semantics:

The maximum value this operation can clamp to is the largest unsigned valuerepresentable by the bit width of the arguments. Because this is an unsignedoperation, the result will never saturate towards zero.

Examples
%res=calli4@llvm.uadd.sat.i4(i41,i42); %res = 3%res=calli4@llvm.uadd.sat.i4(i45,i46); %res = 11%res=calli4@llvm.uadd.sat.i4(i48,i48); %res = 15

llvm.ssub.sat.*’ Intrinsics

Syntax

This is an overloaded intrinsic. You can usellvm.ssub.saton any integer bit width or vectors of integers.

declarei16@llvm.ssub.sat.i16(i16%a,i16%b)declarei32@llvm.ssub.sat.i32(i32%a,i32%b)declarei64@llvm.ssub.sat.i64(i64%a,i64%b)declare<4xi32>@llvm.ssub.sat.v4i32(<4xi32>%a,<4xi32>%b)
Overview

The ‘llvm.ssub.sat’ family of intrinsic functions perform signedsaturating subtraction on the 2 arguments.

Arguments

The arguments (%a and %b) and the result may be of integer types of any bitwidth, but they must have the same bit width.%a and%b are the twovalues that will undergo signed subtraction.

Semantics:

The maximum value this operation can clamp to is the largest signed valuerepresentable by the bit width of the arguments. The minimum value is thesmallest signed value representable by this bit width.

Examples
%res=calli4@llvm.ssub.sat.i4(i42,i41); %res = 1%res=calli4@llvm.ssub.sat.i4(i42,i46); %res = -4%res=calli4@llvm.ssub.sat.i4(i4-4,i45); %res = -8%res=calli4@llvm.ssub.sat.i4(i44,i4-5); %res = 7

llvm.usub.sat.*’ Intrinsics

Syntax

This is an overloaded intrinsic. You can usellvm.usub.saton any integer bit width or vectors of integers.

declarei16@llvm.usub.sat.i16(i16%a,i16%b)declarei32@llvm.usub.sat.i32(i32%a,i32%b)declarei64@llvm.usub.sat.i64(i64%a,i64%b)declare<4xi32>@llvm.usub.sat.v4i32(<4xi32>%a,<4xi32>%b)
Overview

The ‘llvm.usub.sat’ family of intrinsic functions perform unsignedsaturating subtraction on the 2 arguments.

Arguments

The arguments (%a and %b) and the result may be of integer types of any bitwidth, but they must have the same bit width.%a and%b are the twovalues that will undergo unsigned subtraction.

Semantics:

The minimum value this operation can clamp to is 0, which is the smallestunsigned value representable by the bit width of the unsigned arguments.Because this is an unsigned operation, the result will never saturate towardsthe largest possible value representable by this bit width.

Examples
%res=calli4@llvm.usub.sat.i4(i42,i41); %res = 1%res=calli4@llvm.usub.sat.i4(i42,i46); %res = 0

llvm.sshl.sat.*’ Intrinsics

Syntax

This is an overloaded intrinsic. You can usellvm.sshl.saton integers or vectors of integers of any bit width.

declarei16@llvm.sshl.sat.i16(i16%a,i16%b)declarei32@llvm.sshl.sat.i32(i32%a,i32%b)declarei64@llvm.sshl.sat.i64(i64%a,i64%b)declare<4xi32>@llvm.sshl.sat.v4i32(<4xi32>%a,<4xi32>%b)
Overview

The ‘llvm.sshl.sat’ family of intrinsic functions perform signedsaturating left shift on the first argument.

Arguments

The arguments (%a and%b) and the result may be of integer types of anybit width, but they must have the same bit width.%a is the value to beshifted, and%b is the amount to shift by. Ifb is (statically ordynamically) equal to or larger than the integer bit width of the arguments,the result is apoison value. If the arguments arevectors, each vector element ofa is shifted by the corresponding shiftamount inb.

Semantics:

The maximum value this operation can clamp to is the largest signed valuerepresentable by the bit width of the arguments. The minimum value is thesmallest signed value representable by this bit width.

Examples
%res=calli4@llvm.sshl.sat.i4(i42,i41); %res = 4%res=calli4@llvm.sshl.sat.i4(i42,i42); %res = 7%res=calli4@llvm.sshl.sat.i4(i4-5,i41); %res = -8%res=calli4@llvm.sshl.sat.i4(i4-1,i41); %res = -2

llvm.ushl.sat.*’ Intrinsics

Syntax

This is an overloaded intrinsic. You can usellvm.ushl.saton integers or vectors of integers of any bit width.

declarei16@llvm.ushl.sat.i16(i16%a,i16%b)declarei32@llvm.ushl.sat.i32(i32%a,i32%b)declarei64@llvm.ushl.sat.i64(i64%a,i64%b)declare<4xi32>@llvm.ushl.sat.v4i32(<4xi32>%a,<4xi32>%b)
Overview

The ‘llvm.ushl.sat’ family of intrinsic functions perform unsignedsaturating left shift on the first argument.

Arguments

The arguments (%a and%b) and the result may be of integer types of anybit width, but they must have the same bit width.%a is the value to beshifted, and%b is the amount to shift by. Ifb is (statically ordynamically) equal to or larger than the integer bit width of the arguments,the result is apoison value. If the arguments arevectors, each vector element ofa is shifted by the corresponding shiftamount inb.

Semantics:

The maximum value this operation can clamp to is the largest unsigned valuerepresentable by the bit width of the arguments.

Examples
%res=calli4@llvm.ushl.sat.i4(i42,i41); %res = 4%res=calli4@llvm.ushl.sat.i4(i43,i43); %res = 15

Fixed Point Arithmetic Intrinsics

A fixed point number represents a real data type for a number that has a fixednumber of digits after a radix point (equivalent to the decimal point ‘.’).The number of digits after the radix point is referred as thescale. Theseare useful for representing fractional values to a specific precision. Thefollowing intrinsics perform fixed point arithmetic operations on 2 operandsof the same scale, specified as the third argument.

Thellvm.*mul.fix family of intrinsic functions represents a multiplicationof fixed point numbers through scaled integers. Therefore, fixed pointmultiplication can be represented as

%result=calli4@llvm.smul.fix.i4(i4%a,i4%b,i32%scale); Expands to%a2=sexti4%atoi8%b2=sexti4%btoi8%mul=mulnswnuwi8%a2,%b2%scale2=trunci32%scaletoi8%r=ashri8%mul,i8%scale2; this is for a target rounding down towards negative infinity%result=trunci8%rtoi4

Thellvm.*div.fix family of intrinsic functions represents a division offixed point numbers through scaled integers. Fixed point division can berepresented as:

%resultcalli4@llvm.sdiv.fix.i4(i4%a,i4%b,i32%scale); Expands to%a2=sexti4%atoi8%b2=sexti4%btoi8%scale2=trunci32%scaletoi8%a3=shli8%a2,%scale2%r=sdivi8%a3,%b2; this is for a target rounding towards zero%result=trunci8%rtoi4

For each of these functions, if the result cannot be represented exactly withthe provided scale, the result is rounded. Rounding is unspecified sincepreferred rounding may vary for different targets. Rounding is specifiedthrough a target hook. Different pipelines should legalize or optimize thisusing the rounding specified by this hook if it is provided. Operations likeconstant folding, instruction combining, KnownBits, and ValueTracking shouldalso use this hook, if provided, and not assume the direction of rounding. Arounded result must always be within one unit of precision from the trueresult. That is, the error between the returned result and the true result mustbe less than 1/2^(scale).

llvm.smul.fix.*’ Intrinsics

Syntax

This is an overloaded intrinsic. You can usellvm.smul.fixon any integer bit width or vectors of integers.

declarei16@llvm.smul.fix.i16(i16%a,i16%b,i32%scale)declarei32@llvm.smul.fix.i32(i32%a,i32%b,i32%scale)declarei64@llvm.smul.fix.i64(i64%a,i64%b,i32%scale)declare<4xi32>@llvm.smul.fix.v4i32(<4xi32>%a,<4xi32>%b,i32%scale)
Overview

The ‘llvm.smul.fix’ family of intrinsic functions perform signedfixed point multiplication on 2 arguments of the same scale.

Arguments

The arguments (%a and %b) and the result may be of integer types of any bitwidth, but they must have the same bit width. The arguments may also work withint vectors of the same length and int size.%a and%b are the twovalues that will undergo signed fixed point multiplication. The argument%scale represents the scale of both operands, and must be a constantinteger.

Semantics:

This operation performs fixed point multiplication on the 2 arguments of aspecified scale. The result will also be returned in the same scale specifiedin the third argument.

If the result value cannot be precisely represented in the given scale, thevalue is rounded up or down to the closest representable value. The roundingdirection is unspecified.

It is undefined behavior if the result value does not fit within the range ofthe fixed point type.

Examples
%res=calli4@llvm.smul.fix.i4(i43,i42,i320); %res = 6 (2 x 3 = 6)%res=calli4@llvm.smul.fix.i4(i43,i42,i321); %res = 3 (1.5 x 1 = 1.5)%res=calli4@llvm.smul.fix.i4(i43,i4-2,i321); %res = -3 (1.5 x -1 = -1.5); The result in the following could be rounded up to -2 or down to -2.5%res=calli4@llvm.smul.fix.i4(i43,i4-3,i321); %res = -5 (or -4) (1.5 x -1.5 = -2.25)

llvm.umul.fix.*’ Intrinsics

Syntax

This is an overloaded intrinsic. You can usellvm.umul.fixon any integer bit width or vectors of integers.

declarei16@llvm.umul.fix.i16(i16%a,i16%b,i32%scale)declarei32@llvm.umul.fix.i32(i32%a,i32%b,i32%scale)declarei64@llvm.umul.fix.i64(i64%a,i64%b,i32%scale)declare<4xi32>@llvm.umul.fix.v4i32(<4xi32>%a,<4xi32>%b,i32%scale)
Overview

The ‘llvm.umul.fix’ family of intrinsic functions perform unsignedfixed point multiplication on 2 arguments of the same scale.

Arguments

The arguments (%a and %b) and the result may be of integer types of any bitwidth, but they must have the same bit width. The arguments may also work withint vectors of the same length and int size.%a and%b are the twovalues that will undergo unsigned fixed point multiplication. The argument%scale represents the scale of both operands, and must be a constantinteger.

Semantics:

This operation performs unsigned fixed point multiplication on the 2 arguments of aspecified scale. The result will also be returned in the same scale specifiedin the third argument.

If the result value cannot be precisely represented in the given scale, thevalue is rounded up or down to the closest representable value. The roundingdirection is unspecified.

It is undefined behavior if the result value does not fit within the range ofthe fixed point type.

Examples
%res=calli4@llvm.umul.fix.i4(i43,i42,i320); %res = 6 (2 x 3 = 6)%res=calli4@llvm.umul.fix.i4(i43,i42,i321); %res = 3 (1.5 x 1 = 1.5); The result in the following could be rounded down to 3.5 or up to 4%res=calli4@llvm.umul.fix.i4(i415,i41,i321); %res = 7 (or 8) (7.5 x 0.5 = 3.75)

llvm.smul.fix.sat.*’ Intrinsics

Syntax

This is an overloaded intrinsic. You can usellvm.smul.fix.saton any integer bit width or vectors of integers.

declarei16@llvm.smul.fix.sat.i16(i16%a,i16%b,i32%scale)declarei32@llvm.smul.fix.sat.i32(i32%a,i32%b,i32%scale)declarei64@llvm.smul.fix.sat.i64(i64%a,i64%b,i32%scale)declare<4xi32>@llvm.smul.fix.sat.v4i32(<4xi32>%a,<4xi32>%b,i32%scale)
Overview

The ‘llvm.smul.fix.sat’ family of intrinsic functions perform signedfixed point saturating multiplication on 2 arguments of the same scale.

Arguments

The arguments (%a and %b) and the result may be of integer types of any bitwidth, but they must have the same bit width.%a and%b are the twovalues that will undergo signed fixed point multiplication. The argument%scale represents the scale of both operands, and must be a constantinteger.

Semantics:

This operation performs fixed point multiplication on the 2 arguments of aspecified scale. The result will also be returned in the same scale specifiedin the third argument.

If the result value cannot be precisely represented in the given scale, thevalue is rounded up or down to the closest representable value. The roundingdirection is unspecified.

The maximum value this operation can clamp to is the largest signed valuerepresentable by the bit width of the first 2 arguments. The minimum value is thesmallest signed value representable by this bit width.

Examples
%res=calli4@llvm.smul.fix.sat.i4(i43,i42,i320); %res = 6 (2 x 3 = 6)%res=calli4@llvm.smul.fix.sat.i4(i43,i42,i321); %res = 3 (1.5 x 1 = 1.5)%res=calli4@llvm.smul.fix.sat.i4(i43,i4-2,i321); %res = -3 (1.5 x -1 = -1.5); The result in the following could be rounded up to -2 or down to -2.5%res=calli4@llvm.smul.fix.sat.i4(i43,i4-3,i321); %res = -5 (or -4) (1.5 x -1.5 = -2.25); Saturation%res=calli4@llvm.smul.fix.sat.i4(i47,i42,i320); %res = 7%res=calli4@llvm.smul.fix.sat.i4(i47,i44,i322); %res = 7%res=calli4@llvm.smul.fix.sat.i4(i4-8,i45,i322); %res = -8%res=calli4@llvm.smul.fix.sat.i4(i4-8,i4-2,i321); %res = 7; Scale can affect the saturation result%res=calli4@llvm.smul.fix.sat.i4(i42,i44,i320); %res = 7 (2 x 4 -> clamped to 7)%res=calli4@llvm.smul.fix.sat.i4(i42,i44,i321); %res = 4 (1 x 2 = 2)

llvm.umul.fix.sat.*’ Intrinsics

Syntax

This is an overloaded intrinsic. You can usellvm.umul.fix.saton any integer bit width or vectors of integers.

declarei16@llvm.umul.fix.sat.i16(i16%a,i16%b,i32%scale)declarei32@llvm.umul.fix.sat.i32(i32%a,i32%b,i32%scale)declarei64@llvm.umul.fix.sat.i64(i64%a,i64%b,i32%scale)declare<4xi32>@llvm.umul.fix.sat.v4i32(<4xi32>%a,<4xi32>%b,i32%scale)
Overview

The ‘llvm.umul.fix.sat’ family of intrinsic functions perform unsignedfixed point saturating multiplication on 2 arguments of the same scale.

Arguments

The arguments (%a and %b) and the result may be of integer types of any bitwidth, but they must have the same bit width.%a and%b are the twovalues that will undergo unsigned fixed point multiplication. The argument%scale represents the scale of both operands, and must be a constantinteger.

Semantics:

This operation performs fixed point multiplication on the 2 arguments of aspecified scale. The result will also be returned in the same scale specifiedin the third argument.

If the result value cannot be precisely represented in the given scale, thevalue is rounded up or down to the closest representable value. The roundingdirection is unspecified.

The maximum value this operation can clamp to is the largest unsigned valuerepresentable by the bit width of the first 2 arguments. The minimum value is thesmallest unsigned value representable by this bit width (zero).

Examples
%res=calli4@llvm.umul.fix.sat.i4(i43,i42,i320); %res = 6 (2 x 3 = 6)%res=calli4@llvm.umul.fix.sat.i4(i43,i42,i321); %res = 3 (1.5 x 1 = 1.5); The result in the following could be rounded down to 2 or up to 2.5%res=calli4@llvm.umul.fix.sat.i4(i43,i43,i321); %res = 4 (or 5) (1.5 x 1.5 = 2.25); Saturation%res=calli4@llvm.umul.fix.sat.i4(i48,i42,i320); %res = 15 (8 x 2 -> clamped to 15)%res=calli4@llvm.umul.fix.sat.i4(i48,i48,i322); %res = 15 (2 x 2 -> clamped to 3.75); Scale can affect the saturation result%res=calli4@llvm.umul.fix.sat.i4(i42,i44,i320); %res = 7 (2 x 4 -> clamped to 7)%res=calli4@llvm.umul.fix.sat.i4(i42,i44,i321); %res = 4 (1 x 2 = 2)

llvm.sdiv.fix.*’ Intrinsics

Syntax

This is an overloaded intrinsic. You can usellvm.sdiv.fixon any integer bit width or vectors of integers.

declarei16@llvm.sdiv.fix.i16(i16%a,i16%b,i32%scale)declarei32@llvm.sdiv.fix.i32(i32%a,i32%b,i32%scale)declarei64@llvm.sdiv.fix.i64(i64%a,i64%b,i32%scale)declare<4xi32>@llvm.sdiv.fix.v4i32(<4xi32>%a,<4xi32>%b,i32%scale)
Overview

The ‘llvm.sdiv.fix’ family of intrinsic functions perform signedfixed point division on 2 arguments of the same scale.

Arguments

The arguments (%a and %b) and the result may be of integer types of any bitwidth, but they must have the same bit width. The arguments may also work withint vectors of the same length and int size.%a and%b are the twovalues that will undergo signed fixed point division. The argument%scale represents the scale of both operands, and must be a constantinteger.

Semantics:

This operation performs fixed point division on the 2 arguments of aspecified scale. The result will also be returned in the same scale specifiedin the third argument.

If the result value cannot be precisely represented in the given scale, thevalue is rounded up or down to the closest representable value. The roundingdirection is unspecified.

It is undefined behavior if the result value does not fit within the range ofthe fixed point type, or if the second argument is zero.

Examples
%res=calli4@llvm.sdiv.fix.i4(i46,i42,i320); %res = 3 (6 / 2 = 3)%res=calli4@llvm.sdiv.fix.i4(i46,i44,i321); %res = 3 (3 / 2 = 1.5)%res=calli4@llvm.sdiv.fix.i4(i43,i4-2,i321); %res = -3 (1.5 / -1 = -1.5); The result in the following could be rounded up to 1 or down to 0.5%res=calli4@llvm.sdiv.fix.i4(i43,i44,i321); %res = 2 (or 1) (1.5 / 2 = 0.75)

llvm.udiv.fix.*’ Intrinsics

Syntax

This is an overloaded intrinsic. You can usellvm.udiv.fixon any integer bit width or vectors of integers.

declarei16@llvm.udiv.fix.i16(i16%a,i16%b,i32%scale)declarei32@llvm.udiv.fix.i32(i32%a,i32%b,i32%scale)declarei64@llvm.udiv.fix.i64(i64%a,i64%b,i32%scale)declare<4xi32>@llvm.udiv.fix.v4i32(<4xi32>%a,<4xi32>%b,i32%scale)
Overview

The ‘llvm.udiv.fix’ family of intrinsic functions perform unsignedfixed point division on 2 arguments of the same scale.

Arguments

The arguments (%a and %b) and the result may be of integer types of any bitwidth, but they must have the same bit width. The arguments may also work withint vectors of the same length and int size.%a and%b are the twovalues that will undergo unsigned fixed point division. The argument%scale represents the scale of both operands, and must be a constantinteger.

Semantics:

This operation performs fixed point division on the 2 arguments of aspecified scale. The result will also be returned in the same scale specifiedin the third argument.

If the result value cannot be precisely represented in the given scale, thevalue is rounded up or down to the closest representable value. The roundingdirection is unspecified.

It is undefined behavior if the result value does not fit within the range ofthe fixed point type, or if the second argument is zero.

Examples
%res=calli4@llvm.udiv.fix.i4(i46,i42,i320); %res = 3 (6 / 2 = 3)%res=calli4@llvm.udiv.fix.i4(i46,i44,i321); %res = 3 (3 / 2 = 1.5)%res=calli4@llvm.udiv.fix.i4(i41,i4-8,i324); %res = 2 (0.0625 / 0.5 = 0.125); The result in the following could be rounded up to 1 or down to 0.5%res=calli4@llvm.udiv.fix.i4(i43,i44,i321); %res = 2 (or 1) (1.5 / 2 = 0.75)

llvm.sdiv.fix.sat.*’ Intrinsics

Syntax

This is an overloaded intrinsic. You can usellvm.sdiv.fix.saton any integer bit width or vectors of integers.

declarei16@llvm.sdiv.fix.sat.i16(i16%a,i16%b,i32%scale)declarei32@llvm.sdiv.fix.sat.i32(i32%a,i32%b,i32%scale)declarei64@llvm.sdiv.fix.sat.i64(i64%a,i64%b,i32%scale)declare<4xi32>@llvm.sdiv.fix.sat.v4i32(<4xi32>%a,<4xi32>%b,i32%scale)
Overview

The ‘llvm.sdiv.fix.sat’ family of intrinsic functions perform signedfixed point saturating division on 2 arguments of the same scale.

Arguments

The arguments (%a and %b) and the result may be of integer types of any bitwidth, but they must have the same bit width.%a and%b are the twovalues that will undergo signed fixed point division. The argument%scale represents the scale of both operands, and must be a constantinteger.

Semantics:

This operation performs fixed point division on the 2 arguments of aspecified scale. The result will also be returned in the same scale specifiedin the third argument.

If the result value cannot be precisely represented in the given scale, thevalue is rounded up or down to the closest representable value. The roundingdirection is unspecified.

The maximum value this operation can clamp to is the largest signed valuerepresentable by the bit width of the first 2 arguments. The minimum value is thesmallest signed value representable by this bit width.

It is undefined behavior if the second argument is zero.

Examples
%res=calli4@llvm.sdiv.fix.sat.i4(i46,i42,i320); %res = 3 (6 / 2 = 3)%res=calli4@llvm.sdiv.fix.sat.i4(i46,i44,i321); %res = 3 (3 / 2 = 1.5)%res=calli4@llvm.sdiv.fix.sat.i4(i43,i4-2,i321); %res = -3 (1.5 / -1 = -1.5); The result in the following could be rounded up to 1 or down to 0.5%res=calli4@llvm.sdiv.fix.sat.i4(i43,i44,i321); %res = 2 (or 1) (1.5 / 2 = 0.75); Saturation%res=calli4@llvm.sdiv.fix.sat.i4(i4-8,i4-1,i320); %res = 7 (-8 / -1 = 8 => 7)%res=calli4@llvm.sdiv.fix.sat.i4(i44,i42,i322); %res = 7 (1 / 0.5 = 2 => 1.75)%res=calli4@llvm.sdiv.fix.sat.i4(i4-4,i41,i322); %res = -8 (-1 / 0.25 = -4 => -2)

llvm.udiv.fix.sat.*’ Intrinsics

Syntax

This is an overloaded intrinsic. You can usellvm.udiv.fix.saton any integer bit width or vectors of integers.

declarei16@llvm.udiv.fix.sat.i16(i16%a,i16%b,i32%scale)declarei32@llvm.udiv.fix.sat.i32(i32%a,i32%b,i32%scale)declarei64@llvm.udiv.fix.sat.i64(i64%a,i64%b,i32%scale)declare<4xi32>@llvm.udiv.fix.sat.v4i32(<4xi32>%a,<4xi32>%b,i32%scale)
Overview

The ‘llvm.udiv.fix.sat’ family of intrinsic functions perform unsignedfixed point saturating division on 2 arguments of the same scale.

Arguments

The arguments (%a and %b) and the result may be of integer types of any bitwidth, but they must have the same bit width.%a and%b are the twovalues that will undergo unsigned fixed point division. The argument%scale represents the scale of both operands, and must be a constantinteger.

Semantics:

This operation performs fixed point division on the 2 arguments of aspecified scale. The result will also be returned in the same scale specifiedin the third argument.

If the result value cannot be precisely represented in the given scale, thevalue is rounded up or down to the closest representable value. The roundingdirection is unspecified.

The maximum value this operation can clamp to is the largest unsigned valuerepresentable by the bit width of the first 2 arguments. The minimum value is thesmallest unsigned value representable by this bit width (zero).

It is undefined behavior if the second argument is zero.

Examples
%res=calli4@llvm.udiv.fix.sat.i4(i46,i42,i320); %res = 3 (6 / 2 = 3)%res=calli4@llvm.udiv.fix.sat.i4(i46,i44,i321); %res = 3 (3 / 2 = 1.5); The result in the following could be rounded down to 0.5 or up to 1%res=calli4@llvm.udiv.fix.sat.i4(i43,i44,i321); %res = 1 (or 2) (1.5 / 2 = 0.75); Saturation%res=calli4@llvm.udiv.fix.sat.i4(i48,i42,i322); %res = 15 (2 / 0.5 = 4 => 3.75)

Specialized Arithmetic Intrinsics

llvm.canonicalize.*’ Intrinsic

Syntax:
declarefloat@llvm.canonicalize.f32(float%a)declaredouble@llvm.canonicalize.f64(double%b)
Overview:

The ‘llvm.canonicalize.*’ intrinsic returns the platform specific canonicalencoding of a floating-point number. This canonicalization is useful forimplementing certain numeric primitives such as frexp. The canonical encoding isdefined by IEEE-754-2008 to be:

2.1.8canonicalencoding:Thepreferredencodingofafloating-pointrepresentationinaformat.Appliedtodeclets,significandsoffinitenumbers,infinities,andNaNs,especiallyindecimalformats.

This operation can also be considered equivalent to the IEEE-754-2008conversion of a floating-point value to the same format. NaNs are handledaccording to section 6.2.

Examples of non-canonical encodings:

  • x87 pseudo denormals, pseudo NaNs, pseudo Infinity, Unnormals. These areconverted to a canonical representation per hardware-specific protocol.

  • Many normal decimal floating-point numbers have non-canonical alternativeencodings.

  • Some machines, like GPUs or ARMv7 NEON, do not support subnormal values.These are treated as non-canonical encodings of zero and will be flushed toa zero of the same sign by this operation.

Note that per IEEE-754-2008 6.2, systems that support signaling NaNs withdefault exception handling must signal an invalid exception, and produce aquiet NaN result.

This function should always be implementable as multiplication by 1.0, providedthat the compiler does not constant fold the operation. Likewise, division by1.0 andllvm.minnum(x,x) are possible implementations. Addition with-0.0 is also sufficient provided that the rounding mode is not -Infinity.

@llvm.canonicalize must preserve the equality relation. That is:

  • (@llvm.canonicalize(x)==x) is equivalent to(x==x)

  • (@llvm.canonicalize(x)==@llvm.canonicalize(y)) is equivalentto(x==y)

Additionally, the sign of zero must be conserved:@llvm.canonicalize(-0.0)=-0.0 and@llvm.canonicalize(+0.0)=+0.0

The payload bits of a NaN must be conserved, with two exceptions.First, environments which use only a single canonical representation of NaNmust perform said canonicalization. Second, SNaNs must be quieted per theusual methods.

The canonicalization operation may be optimized away if:

  • The input is known to be canonical. For example, it was produced by afloating-point operation that is required by the standard to be canonical.

  • The result is consumed only by (or fused with) other floating-pointoperations. That is, the bits of the floating-point value are not examined.

llvm.fmuladd.*’ Intrinsic

Syntax:
declarefloat@llvm.fmuladd.f32(float%a,float%b,float%c)declaredouble@llvm.fmuladd.f64(double%a,double%b,double%c)
Overview:

The ‘llvm.fmuladd.*’ intrinsic functions represent multiply-addexpressions that can be fused if the code generator determines that (a) thetarget instruction set has support for a fused operation, and (b) that thefused operation is more efficient than the equivalent, separate pair of muland add instructions.

Arguments:

The ‘llvm.fmuladd.*’ intrinsics each take three arguments: twomultiplicands, a and b, and an addend c.

Semantics:

The expression:

%0=callfloat@llvm.fmuladd.f32(%a,%b,%c)

is equivalent to the expression a * b + c, except that it is unspecifiedwhether rounding will be performed between the multiplication and additionsteps. Fusion is not guaranteed, even if the target platform supports it.If a fused multiply-add is required, the correspondingllvm.fma intrinsic function should be used instead.This never sets errno, just as ‘llvm.fma.*’.

Examples:
%r2=callfloat@llvm.fmuladd.f32(float%a,float%b,float%c); yields float:r2 = (a * b) + c

Hardware-Loop Intrinsics

LLVM support several intrinsics to mark a loop as a hardware-loop. They arehints to the backend which are required to lower these intrinsics further to targetspecific instructions, or revert the hardware-loop to a normal loop if targetspecific restriction are not met and a hardware-loop can’t be generated.

These intrinsics may be modified in the future and are not intended to be usedoutside the backend. Thus, front-end and mid-level optimizations should not begenerating these intrinsics.

llvm.set.loop.iterations.*’ Intrinsic

Syntax:

This is an overloaded intrinsic.

declarevoid@llvm.set.loop.iterations.i32(i32)declarevoid@llvm.set.loop.iterations.i64(i64)
Overview:

The ‘llvm.set.loop.iterations.*’ intrinsics are used to specify thehardware-loop trip count. They are placed in the loop preheader basic block andare marked asIntrNoDuplicate to avoid optimizers duplicating theseinstructions.

Arguments:

The integer operand is the loop trip count of the hardware-loop, and thusnot e.g. the loop back-edge taken count.

Semantics:

The ‘llvm.set.loop.iterations.*’ intrinsics do not perform any arithmeticon their operand. It’s a hint to the backend that can use this to set up thehardware-loop count with a target specific instruction, usually a move of thisvalue to a special register or a hardware-loop instruction.

llvm.start.loop.iterations.*’ Intrinsic

Syntax:

This is an overloaded intrinsic.

declarei32@llvm.start.loop.iterations.i32(i32)declarei64@llvm.start.loop.iterations.i64(i64)
Overview:

The ‘llvm.start.loop.iterations.*’ intrinsics are similar to the‘llvm.set.loop.iterations.*’ intrinsics, used to specify thehardware-loop trip count but also produce a value identical to the inputthat can be used as the input to the loop. They are placed in the looppreheader basic block and the output is expected to be the input to thephi for the induction variable of the loop, decremented by the‘llvm.loop.decrement.reg.*’.

Arguments:

The integer operand is the loop trip count of the hardware-loop, and thusnot e.g. the loop back-edge taken count.

Semantics:

The ‘llvm.start.loop.iterations.*’ intrinsics do not perform any arithmeticon their operand. It’s a hint to the backend that can use this to set up thehardware-loop count with a target specific instruction, usually a move of thisvalue to a special register or a hardware-loop instruction.

llvm.test.set.loop.iterations.*’ Intrinsic

Syntax:

This is an overloaded intrinsic.

declarei1@llvm.test.set.loop.iterations.i32(i32)declarei1@llvm.test.set.loop.iterations.i64(i64)
Overview:

The ‘llvm.test.set.loop.iterations.*’ intrinsics are used to specify thethe loop trip count, and also test that the given count is not zero, allowingit to control entry to a while-loop. They are placed in the loop preheader’spredecessor basic block, and are marked asIntrNoDuplicate to avoidoptimizers duplicating these instructions.

Arguments:

The integer operand is the loop trip count of the hardware-loop, and thusnot e.g. the loop back-edge taken count.

Semantics:

The ‘llvm.test.set.loop.iterations.*’ intrinsics do not perform anyarithmetic on their operand. It’s a hint to the backend that can use this toset up the hardware-loop count with a target specific instruction, usually amove of this value to a special register or a hardware-loop instruction.The result is the conditional value of whether the given count is not zero.

llvm.test.start.loop.iterations.*’ Intrinsic

Syntax:

This is an overloaded intrinsic.

declare{i32,i1}@llvm.test.start.loop.iterations.i32(i32)declare{i64,i1}@llvm.test.start.loop.iterations.i64(i64)
Overview:

The ‘llvm.test.start.loop.iterations.*’ intrinsics are similar to the‘llvm.test.set.loop.iterations.*’ and ‘llvm.start.loop.iterations.*’intrinsics, used to specify the hardware-loop trip count, but also produce avalue identical to the input that can be used as the input to the loop. Thesecond i1 output controls entry to a while-loop.

Arguments:

The integer operand is the loop trip count of the hardware-loop, and thusnot e.g. the loop back-edge taken count.

Semantics:

The ‘llvm.test.start.loop.iterations.*’ intrinsics do not perform anyarithmetic on their operand. It’s a hint to the backend that can use this toset up the hardware-loop count with a target specific instruction, usually amove of this value to a special register or a hardware-loop instruction.The result is a pair of the input and a conditional value of whether thegiven count is not zero.

llvm.loop.decrement.reg.*’ Intrinsic

Syntax:

This is an overloaded intrinsic.

declarei32@llvm.loop.decrement.reg.i32(i32,i32)declarei64@llvm.loop.decrement.reg.i64(i64,i64)
Overview:

The ‘llvm.loop.decrement.reg.*’ intrinsics are used to lower the loopiteration counter and return an updated value that will be used in the nextloop test check.

Arguments:

Both arguments must have identical integer types. The first operand is theloop iteration counter. The second operand is the maximum number of elementsprocessed in an iteration.

Semantics:

The ‘llvm.loop.decrement.reg.*’ intrinsics do an integerSUB of itstwo operands, which is not allowed to wrap. They return the remaining number ofiterations still to be executed, and can be used together with aPHI,ICMP andBR to control the number of loop iterations executed. Anyoptimizations are allowed to treat it is aSUB, and it is supported bySCEV, so it’s the backends responsibility to handle cases where it may beoptimized. These intrinsics are marked asIntrNoDuplicate to avoidoptimizers duplicating these instructions.

llvm.loop.decrement.*’ Intrinsic

Syntax:

This is an overloaded intrinsic.

declarei1@llvm.loop.decrement.i32(i32)declarei1@llvm.loop.decrement.i64(i64)
Overview:

The HardwareLoops pass allows the loop decrement value to be specified with anoption. It defaults to a loop decrement value of 1, but it can be an unsignedinteger value provided by this option. The ‘llvm.loop.decrement.*’intrinsics decrement the loop iteration counter with this value, and return afalse predicate if the loop should exit, and true otherwise.This is emitted if the loop counter is not updated via aPHI node, whichcan also be controlled with an option.

Arguments:

The integer argument is the loop decrement value used to decrement the loopiteration counter.

Semantics:

The ‘llvm.loop.decrement.*’ intrinsics do aSUB of the loop iterationcounter with the given loop decrement value, and return false if the loopshould exit, thisSUB is not allowed to wrap. The result is a conditionthat is used by the conditional branch controlling the loop.

Vector Reduction Intrinsics

Horizontal reductions of vectors can be expressed using the followingintrinsics. Each one takes a vector operand as an input and applies itsrespective operation across all elements of the vector, returning a singlescalar result of the same element type.

llvm.vector.reduce.add.*’ Intrinsic

Syntax:
declarei32@llvm.vector.reduce.add.v4i32(<4xi32>%a)declarei64@llvm.vector.reduce.add.v2i64(<2xi64>%a)
Overview:

The ‘llvm.vector.reduce.add.*’ intrinsics do an integerADDreduction of a vector, returning the result as a scalar. The return type matchesthe element-type of the vector input.

Arguments:

The argument to this intrinsic must be a vector of integer values.

llvm.vector.reduce.fadd.*’ Intrinsic

Syntax:
declarefloat@llvm.vector.reduce.fadd.v4f32(float%start_value,<4xfloat>%a)declaredouble@llvm.vector.reduce.fadd.v2f64(double%start_value,<2xdouble>%a)
Overview:

The ‘llvm.vector.reduce.fadd.*’ intrinsics do a floating-pointADD reduction of a vector, returning the result as a scalar. The return typematches the element-type of the vector input.

If the intrinsic call has the ‘reassoc’ flag set, then the reduction will notpreserve the associativity of an equivalent scalarized counterpart. Otherwisethe reduction will besequential, thus implying that the operation respectsthe associativity of a scalarized reduction. That is, the reduction begins withthe start value and performs an fadd operation with consecutively increasingvector element indices. See the following pseudocode:

floatsequential_fadd(start_value,input_vector)result=start_valuefori=0tolength(input_vector)result=result+input_vector[i]returnresult
Arguments:

The first argument to this intrinsic is a scalar start value for the reduction.The type of the start value matches the element-type of the vector input.The second argument must be a vector of floating-point values.

To ignore the start value, negative zero (-0.0) can be used, as it isthe neutral value of floating point addition.

Examples:
%unord=callreassocfloat@llvm.vector.reduce.fadd.v4f32(float-0.0,<4xfloat>%input);relaxedreduction%ord=callfloat@llvm.vector.reduce.fadd.v4f32(float%start_value,<4xfloat>%input);sequentialreduction

llvm.vector.reduce.mul.*’ Intrinsic

Syntax:
declarei32@llvm.vector.reduce.mul.v4i32(<4xi32>%a)declarei64@llvm.vector.reduce.mul.v2i64(<2xi64>%a)
Overview:

The ‘llvm.vector.reduce.mul.*’ intrinsics do an integerMULreduction of a vector, returning the result as a scalar. The return type matchesthe element-type of the vector input.

Arguments:

The argument to this intrinsic must be a vector of integer values.

llvm.vector.reduce.fmul.*’ Intrinsic

Syntax:
declarefloat@llvm.vector.reduce.fmul.v4f32(float%start_value,<4xfloat>%a)declaredouble@llvm.vector.reduce.fmul.v2f64(double%start_value,<2xdouble>%a)
Overview:

The ‘llvm.vector.reduce.fmul.*’ intrinsics do a floating-pointMUL reduction of a vector, returning the result as a scalar. The return typematches the element-type of the vector input.

If the intrinsic call has the ‘reassoc’ flag set, then the reduction will notpreserve the associativity of an equivalent scalarized counterpart. Otherwisethe reduction will besequential, thus implying that the operation respectsthe associativity of a scalarized reduction. That is, the reduction begins withthe start value and performs an fmul operation with consecutively increasingvector element indices. See the following pseudocode:

floatsequential_fmul(start_value,input_vector)result=start_valuefori=0tolength(input_vector)result=result*input_vector[i]returnresult
Arguments:

The first argument to this intrinsic is a scalar start value for the reduction.The type of the start value matches the element-type of the vector input.The second argument must be a vector of floating-point values.

To ignore the start value, one (1.0) can be used, as it is the neutralvalue of floating point multiplication.

Examples:
%unord=callreassocfloat@llvm.vector.reduce.fmul.v4f32(float1.0,<4xfloat>%input);relaxedreduction%ord=callfloat@llvm.vector.reduce.fmul.v4f32(float%start_value,<4xfloat>%input);sequentialreduction

llvm.vector.reduce.and.*’ Intrinsic

Syntax:
declarei32@llvm.vector.reduce.and.v4i32(<4xi32>%a)
Overview:

The ‘llvm.vector.reduce.and.*’ intrinsics do a bitwiseANDreduction of a vector, returning the result as a scalar. The return type matchesthe element-type of the vector input.

Arguments:

The argument to this intrinsic must be a vector of integer values.

llvm.vector.reduce.or.*’ Intrinsic

Syntax:
declarei32@llvm.vector.reduce.or.v4i32(<4xi32>%a)
Overview:

The ‘llvm.vector.reduce.or.*’ intrinsics do a bitwiseOR reductionof a vector, returning the result as a scalar. The return type matches theelement-type of the vector input.

Arguments:

The argument to this intrinsic must be a vector of integer values.

llvm.vector.reduce.xor.*’ Intrinsic

Syntax:
declarei32@llvm.vector.reduce.xor.v4i32(<4xi32>%a)
Overview:

The ‘llvm.vector.reduce.xor.*’ intrinsics do a bitwiseXORreduction of a vector, returning the result as a scalar. The return type matchesthe element-type of the vector input.

Arguments:

The argument to this intrinsic must be a vector of integer values.

llvm.vector.reduce.smax.*’ Intrinsic

Syntax:
declarei32@llvm.vector.reduce.smax.v4i32(<4xi32>%a)
Overview:

The ‘llvm.vector.reduce.smax.*’ intrinsics do a signed integerMAX reduction of a vector, returning the result as a scalar. The return typematches the element-type of the vector input.

Arguments:

The argument to this intrinsic must be a vector of integer values.

llvm.vector.reduce.smin.*’ Intrinsic

Syntax:
declarei32@llvm.vector.reduce.smin.v4i32(<4xi32>%a)
Overview:

The ‘llvm.vector.reduce.smin.*’ intrinsics do a signed integerMIN reduction of a vector, returning the result as a scalar. The return typematches the element-type of the vector input.

Arguments:

The argument to this intrinsic must be a vector of integer values.

llvm.vector.reduce.umax.*’ Intrinsic

Syntax:
declarei32@llvm.vector.reduce.umax.v4i32(<4xi32>%a)
Overview:

The ‘llvm.vector.reduce.umax.*’ intrinsics do an unsignedintegerMAX reduction of a vector, returning the result as a scalar. Thereturn type matches the element-type of the vector input.

Arguments:

The argument to this intrinsic must be a vector of integer values.

llvm.vector.reduce.umin.*’ Intrinsic

Syntax:
declarei32@llvm.vector.reduce.umin.v4i32(<4xi32>%a)
Overview:

The ‘llvm.vector.reduce.umin.*’ intrinsics do an unsignedintegerMIN reduction of a vector, returning the result as a scalar. Thereturn type matches the element-type of the vector input.

Arguments:

The argument to this intrinsic must be a vector of integer values.

llvm.vector.reduce.fmax.*’ Intrinsic

Syntax:
declarefloat@llvm.vector.reduce.fmax.v4f32(<4xfloat>%a)declaredouble@llvm.vector.reduce.fmax.v2f64(<2xdouble>%a)
Overview:

The ‘llvm.vector.reduce.fmax.*’ intrinsics do a floating-pointMAX reduction of a vector, returning the result as a scalar. The return typematches the element-type of the vector input.

This instruction has the same comparison semantics as the ‘llvm.maxnum.*’intrinsic. If the intrinsic call has thennan fast-math flag, then theoperation can assume that NaNs are not present in the input vector.

Arguments:

The argument to this intrinsic must be a vector of floating-point values.

llvm.vector.reduce.fmin.*’ Intrinsic

Syntax:

This is an overloaded intrinsic.

declarefloat@llvm.vector.reduce.fmin.v4f32(<4xfloat>%a)declaredouble@llvm.vector.reduce.fmin.v2f64(<2xdouble>%a)
Overview:

The ‘llvm.vector.reduce.fmin.*’ intrinsics do a floating-pointMIN reduction of a vector, returning the result as a scalar. The return typematches the element-type of the vector input.

This instruction has the same comparison semantics as the ‘llvm.minnum.*’intrinsic. If the intrinsic call has thennan fast-math flag, then theoperation can assume that NaNs are not present in the input vector.

Arguments:

The argument to this intrinsic must be a vector of floating-point values.

llvm.vector.reduce.fmaximum.*’ Intrinsic

Syntax:

This is an overloaded intrinsic.

declarefloat@llvm.vector.reduce.fmaximum.v4f32(<4xfloat>%a)declaredouble@llvm.vector.reduce.fmaximum.v2f64(<2xdouble>%a)
Overview:

The ‘llvm.vector.reduce.fmaximum.*’ intrinsics do a floating-pointMAX reduction of a vector, returning the result as a scalar. The return typematches the element-type of the vector input.

This instruction has the same comparison semantics as the ‘llvm.maximum.*’intrinsic. That is, this intrinsic propagates NaNs and +0.0 is consideredgreater than -0.0. If any element of the vector is a NaN, the result is NaN.

Arguments:

The argument to this intrinsic must be a vector of floating-point values.

llvm.vector.reduce.fminimum.*’ Intrinsic

Syntax:

This is an overloaded intrinsic.

declarefloat@llvm.vector.reduce.fminimum.v4f32(<4xfloat>%a)declaredouble@llvm.vector.reduce.fminimum.v2f64(<2xdouble>%a)
Overview:

The ‘llvm.vector.reduce.fminimum.*’ intrinsics do a floating-pointMIN reduction of a vector, returning the result as a scalar. The return typematches the element-type of the vector input.

This instruction has the same comparison semantics as the ‘llvm.minimum.*’intrinsic. That is, this intrinsic propagates NaNs and -0.0 is considered lessthan +0.0. If any element of the vector is a NaN, the result is NaN.

Arguments:

The argument to this intrinsic must be a vector of floating-point values.

llvm.vector.insert’ Intrinsic

Syntax:

This is an overloaded intrinsic.

;Insertfixedtypeintoscalabletypedeclare<vscalex4xfloat>@llvm.vector.insert.nxv4f32.v4f32(<vscalex4xfloat>%vec,<4xfloat>%subvec,i64<idx>)declare<vscalex2xdouble>@llvm.vector.insert.nxv2f64.v2f64(<vscalex2xdouble>%vec,<2xdouble>%subvec,i64<idx>);Insertscalabletypeintoscalabletypedeclare<vscalex4xfloat>@llvm.vector.insert.nxv4f64.nxv2f64(<vscalex4xfloat>%vec,<vscalex2xfloat>%subvec,i64<idx>);Insertfixedtypeintofixedtypedeclare<4xdouble>@llvm.vector.insert.v4f64.v2f64(<4xdouble>%vec,<2xdouble>%subvec,i64<idx>)
Overview:

The ‘llvm.vector.insert.*’ intrinsics insert a vector into another vectorstarting from a given index. The return type matches the type of the vector weinsert into. Conceptually, this can be used to build a scalable vector out ofnon-scalable vectors, however this intrinsic can also be used on purely fixedtypes.

Scalable vectors can only be inserted into other scalable vectors.

Arguments:

Thevec is the vector whichsubvec will be inserted into.Thesubvec is the vector that will be inserted.

idx represents the starting element number at whichsubvec will beinserted.idx must be a constant multiple ofsubvec’s known minimumvector length. Ifsubvec is a scalable vector,idx is first scaled bythe runtime scaling factor ofsubvec. The elements ofvec starting atidx are overwritten withsubvec. Elementsidx through (idx +num_elements(subvec) - 1) must be validvec indices. If this conditioncannot be determined statically but is false at runtime, then the result vectoris apoison value.

llvm.vector.extract’ Intrinsic

Syntax:

This is an overloaded intrinsic.

;Extractfixedtypefromscalabletypedeclare<4xfloat>@llvm.vector.extract.v4f32.nxv4f32(<vscalex4xfloat>%vec,i64<idx>)declare<2xdouble>@llvm.vector.extract.v2f64.nxv2f64(<vscalex2xdouble>%vec,i64<idx>);Extractscalabletypefromscalabletypedeclare<vscalex2xfloat>@llvm.vector.extract.nxv2f32.nxv4f32(<vscalex4xfloat>%vec,i64<idx>);Extractfixedtypefromfixedtypedeclare<2xdouble>@llvm.vector.extract.v2f64.v4f64(<4xdouble>%vec,i64<idx>)
Overview:

The ‘llvm.vector.extract.*’ intrinsics extract a vector from within anothervector starting from a given index. The return type must be explicitlyspecified. Conceptually, this can be used to decompose a scalable vector intonon-scalable parts, however this intrinsic can also be used on purely fixedtypes.

Scalable vectors can only be extracted from other scalable vectors.

Arguments:

Thevec is the vector from which we will extract a subvector.

Theidx specifies the starting element number withinvec from which asubvector is extracted.idx must be a constant multiple of the known-minimumvector length of the result type. If the result type is a scalable vector,idx is first scaled by the result type’s runtime scaling factor. Elementsidx through (idx + num_elements(result_type) - 1) must be valid vectorindices. If this condition cannot be determined statically but is false atruntime, then the result vector is apoison value. Theidx parameter must be a vector index constant type (for most targets thiswill be an integer pointer type).

llvm.vector.reverse’ Intrinsic

Syntax:

This is an overloaded intrinsic.

declare<2xi8>@llvm.vector.reverse.v2i8(<2xi8>%a)declare<vscalex4xi32>@llvm.vector.reverse.nxv4i32(<vscalex4xi32>%a)
Overview:

The ‘llvm.vector.reverse.*’ intrinsics reverse a vector.The intrinsic takes a single vector and returns a vector of matching type butwith the original lane order reversed. These intrinsics work for both fixedand scalable vectors. While this intrinsic supports all vector typesthe recommended way to express this operation for fixed-width vectors isstill to use a shufflevector, as that may allow for more optimizationopportunities.

Arguments:

The argument to this intrinsic must be a vector.

llvm.vector.deinterleave2/3/4/5/6/7/8’ Intrinsic

Syntax:

This is an overloaded intrinsic.

declare{<2xdouble>,<2xdouble>}@llvm.vector.deinterleave2.v4f64(<4xdouble>%vec1)declare{<vscalex4xi32>,<vscalex4xi32>}@llvm.vector.deinterleave2.nxv8i32(<vscalex8xi32>%vec1)declare{<vscalex2xi8>,<vscalex2xi8>,<vscalex2xi8>}@llvm.vector.deinterleave3.nxv6i8(<vscalex6xi8>%vec1)declare{<2xi32>,<2xi32>,<2xi32>,<2xi32>,<2xi32>}@llvm.vector.deinterleave5.v10i32(<10xi32>%vec1)declare{<2xi32>,<2xi32>,<2xi32>,<2xi32>,<2xi32>,<2xi32>,<2xi32>}@llvm.vector.deinterleave7.v14i32(<14xi32>%vec1)
Overview:

The ‘llvm.vector.deinterleave2/3/4/5/6/7/8’ intrinsics deinterleave adjacent lanesinto 2 through to 8 separate vectors, respectively, and return them as theresult.

This intrinsic works for both fixed and scalable vectors. While this intrinsicsupports all vector types the recommended way to express this operation forfactor of 2 on fixed-width vectors is still to use a shufflevector, as thatmay allow for more optimization opportunities.

For example:

{<2 x i64>, <2 x i64>} llvm.vector.deinterleave2.v4i64(<4 x i64> <i64 0, i64 1, i64 2, i64 3>); ==> {<2 x i64> <i64 0, i64 2>, <2 x i64> <i64 1, i64 3>}{<2 x i32>, <2 x i32>, <2 x i32>} llvm.vector.deinterleave3.v6i32(<6 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5>)  ; ==> {<2 x i32> <i32 0, i32 3>, <2 x i32> <i32 1, i32 4>, <2 x i32> <i32 2, i32 5>}
Arguments:

The argument is a vector whose type corresponds to the logical concatenation ofthe aggregated result types.

llvm.vector.interleave2/3/4/5/6/7/8’ Intrinsic

Syntax:

This is an overloaded intrinsic.

declare<4xdouble>@llvm.vector.interleave2.v4f64(<2xdouble>%vec1,<2xdouble>%vec2)declare<vscalex8xi32>@llvm.vector.interleave2.nxv8i32(<vscalex4xi32>%vec1,<vscalex4xi32>%vec2)declare<vscalex6xi8>@llvm.vector.interleave3.nxv6i8(<vscalex2xi8>%vec0,<vscalex2xi8>%vec1,<vscalex2xi8>%vec2)declare<10xi32>@llvm.vector.interleave5.v10i32(<2xi32>%vec0,<2xi32>%vec1,<2xi32>%vec2,<2xi32>%vec3,<2xi32>%vec4)declare<14xi32>@llvm.vector.interleave7.v14i32(<2xi32>%vec0,<2xi32>%vec1,<2xi32>%vec2,<2xi32>%vec3,<2xi32>%vec4,<2xi32>%vec5,<2xi32>%vec6)
Overview:

The ‘llvm.vector.interleave2/3/4/5/6/7/8’ intrinsic constructs a vectorby interleaving all the input vectors.

This intrinsic works for both fixed and scalable vectors. While this intrinsicsupports all vector types the recommended way to express this operation forfactor of 2 on fixed-width vectors is still to use a shufflevector, as thatmay allow for more optimization opportunities.

For example:

<4 x i64> llvm.vector.interleave2.v4i64(<2 x i64> <i64 0, i64 2>, <2 x i64> <i64 1, i64 3>); ==> <4 x i64> <i64 0, i64 1, i64 2, i64 3><6 x i32> llvm.vector.interleave3.v6i32(<2 x i32> <i32 0, i32 3>, <2 x i32> <i32 1, i32 4>, <2 x i32> <i32 2, i32 5>) ; ==> <6 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5>
Arguments:

All arguments must be vectors of the same type whereby their logicalconcatenation matches the result type.

llvm.experimental.cttz.elts’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can use`llvm.experimental.cttz.elts`on any vector of integer elements, both fixed width and scalable.

declarei8@llvm.experimental.cttz.elts.i8.v8i1(<8xi1><src>,i1<is_zero_poison>)
Overview:

The ‘llvm.experimental.cttz.elts’ intrinsic counts the number of trailingzero elements of a vector.

Arguments:

The first argument is the vector to be counted. This argument must be a vectorwith integer element type. The return type must also be an integer type which iswide enough to hold the maximum number of elements of the source vector. Thebehavior of this intrinsic is undefined if the return type is not wide enoughfor the number of elements in the input vector.

The second argument is a constant flag that indicates whether the intrinsicreturns a valid result if the first argument is all zero. If the first argumentis all zero and the second argument is true, the result is poison.

Semantics:

The ‘llvm.experimental.cttz.elts’ intrinsic counts the trailing (leastsignificant) zero elements in a vector. Ifsrc==0 the result is thenumber of elements in the input vector.

llvm.vector.splice’ Intrinsic

Syntax:

This is an overloaded intrinsic.

declare<2xdouble>@llvm.vector.splice.v2f64(<2xdouble>%vec1,<2xdouble>%vec2,i32%imm)declare<vscalex4xi32>@llvm.vector.splice.nxv4i32(<vscalex4xi32>%vec1,<vscalex4xi32>%vec2,i32%imm)
Overview:

The ‘llvm.vector.splice.*’ intrinsics construct a vector byconcatenating elements from the first input vector with elements of the secondinput vector, returning a vector of the same type as the input vectors. Thesigned immediate, modulo the number of elements in the vector, is the indexinto the first vector from which to extract the result value. This meansconceptually that for a positive immediate, a vector is extracted fromconcat(%vec1,%vec2) starting at indeximm, whereas for a negativeimmediate, it extracts-imm trailing elements from the first vector, andthe remaining elements from%vec2.

These intrinsics work for both fixed and scalable vectors. While this intrinsicsupports all vector types the recommended way to express this operation forfixed-width vectors is still to use a shufflevector, as that may allow for moreoptimization opportunities.

For example:

llvm.vector.splice(<A,B,C,D>, <E,F,G,H>, 1);  ==> <B, C, D, E> indexllvm.vector.splice(<A,B,C,D>, <E,F,G,H>, -3); ==> <B, C, D, E> trailing elements
Arguments:

The first two operands are vectors with the same type. The start index is immmodulo the runtime number of elements in the source vector. For a fixed-widthvector <N x eltty>, imm is a signed integer constant in the range-N <= imm < N. For a scalable vector <vscale x N x eltty>, imm is a signedinteger constant in the range -X <= imm < X where X=vscale_range_min * N.

llvm.stepvector’ Intrinsic

This is an overloaded intrinsic. You can usellvm.stepvectorto generate a vector whose lane values comprise the linear sequence<0, 1, 2, …>. It is primarily intended for scalable vectors.

declare<vscalex4xi32>@llvm.stepvector.nxv4i32()declare<vscalex8xi16>@llvm.stepvector.nxv8i16()

The ‘llvm.stepvector’ intrinsics are used to create vectorsof integers whose elements contain a linear sequence of values starting from 0with a step of 1. This intrinsic can only be used for vectors with integerelements that are at least 8 bits in size. If the sequence value exceedsthe allowed limit for the element type then the result for that lane isa poison value.

These intrinsics work for both fixed and scalable vectors. While this intrinsicsupports all vector types, the recommended way to express this operation forfixed-width vectors is still to generate a constant vector instead.

Arguments:

None.

llvm.experimental.get.vector.length’ Intrinsic

Syntax:

This is an overloaded intrinsic.

declarei32@llvm.experimental.get.vector.length.i32(i32%cnt,i32immarg%vf,i1immarg%scalable)declarei32@llvm.experimental.get.vector.length.i64(i64%cnt,i32immarg%vf,i1immarg%scalable)
Overview:

The ‘llvm.experimental.get.vector.length.*’ intrinsics take a number ofelements to process and returns how many of the elements can be processedwith the requested vectorization factor.

Arguments:

The first argument is an unsigned value of any scalar integer type and specifiesthe total number of elements to be processed. The second argument is an i32immediate for the vectorization factor. The third argument indicates if thevectorization factor should be multiplied by vscale.

Semantics:

Returns a non-negative i32 value (explicit vector length) that is unknown at compiletime and depends on the hardware specification.If the result value does not fit in the result type, then the result isapoison value.

This intrinsic is intended to be used by loop vectorization with VP intrinsicsin order to get the number of elements to process on each loop iteration. Theresult should be used to decrease the count for the next iteration until thecount reaches zero.

Let%max_lanes be the number of lanes in the type described by%vf and%scalable, here are the constraints on the returned value:

  • If%cnt equals to 0, returns 0.

  • The returned value is always less than or equal to%max_lanes.

  • The returned value is always greater than or equal toceil(%cnt/ceil(%cnt/%max_lanes)),if%cnt is non-zero.

  • The returned values are monotonically non-increasing in each loop iteration. That is,the returned value of an iteration is at least as large as that of any lateriteration.

Note that it has the following implications:

  • For a loop that uses this intrinsic, the number of iterations is equal toceil(%C/%max_lanes) where%C is the initial%cnt value.

  • If%cnt is non-zero, the return value is non-zero as well.

  • If%cnt is less than or equal to%max_lanes, the return value is equal to%cnt.

llvm.experimental.vector.partial.reduce.add.*’ Intrinsic

Syntax:

This is an overloaded intrinsic.

declare<4xi32>@llvm.experimental.vector.partial.reduce.add.v4i32.v4i32.v8i32(<4xi32>%a,<8xi32>%b)declare<4xi32>@llvm.experimental.vector.partial.reduce.add.v4i32.v4i32.v16i32(<4xi32>%a,<16xi32>%b)declare<vscalex4xi32>@llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv4i32.nxv8i32(<vscalex4xi32>%a,<vscalex8xi32>%b)declare<vscalex4xi32>@llvm.experimental.vector.partial.reduce.add.nxv4i32.nxv4i32.nxv16i32(<vscalex4xi32>%a,<vscalex16xi32>%b)
Overview:

The ‘llvm.vector.experimental.partial.reduce.add.*’ intrinsics reduce theconcatenation of the two vector arguments down to the number of elements of theresult vector type.

Arguments:

The first argument is an integer vector with the same type as the result.

The second argument is a vector with a length that is a known integer multipleof the result’s type, while maintaining the same element type.

Semantics:

Other than the reduction operator (e.g. add) the way in which the concatenatedarguments is reduced is entirely unspecified. By their nature these intrinsicsare not expected to be useful in isolation but instead implement the first phaseof an overall reduction operation.

The typical use case is loop vectorization where reductions are split into anin-loop phase, where maintaining an unordered vector result is important forperformance, and an out-of-loop phase to calculate the final scalar result.

By avoiding the introduction of new ordering constraints, these intrinsicsenhance the ability to leverage a target’s accumulation instructions.

llvm.experimental.vector.histogram.*’ Intrinsic

These intrinsics are overloaded.

These intrinsics represent histogram-like operations; that is, updating valuesin memory that may not be contiguous, and where multiple elements within asingle vector may be updating the same value in memory.

The update operation must be specified as part of the intrinsic name. For asimple histogram like the following theadd operation would be used.

voidsimple_histogram(int*restrictbuckets,unsigned*indices,intN,intinc){for(inti=0;i<N;++i)buckets[indices[i]]+=inc;}

More update operation types may be added in the future.

declarevoid@llvm.experimental.vector.histogram.add.v8p0.i32(<8xptr>%ptrs,i32%inc,<8xi1>%mask)declarevoid@llvm.experimental.vector.histogram.add.nxv2p0.i64(<vscalex2xptr>%ptrs,i64%inc,<vscalex2xi1>%mask)declarevoid@llvm.experimental.vector.histogram.uadd.sat.v8p0.i32(<8xptr>%ptrs,i32%inc,<8xi1>%mask)declarevoid@llvm.experimental.vector.histogram.umax.v8p0.i32(<8xptr>%ptrs,i32%val,<8xi1>%mask)declarevoid@llvm.experimental.vector.histogram.umin.v8p0.i32(<8xptr>%ptrs,i32%val,<8xi1>%mask)
Arguments:

The first argument is a vector of pointers to the memory locations to beupdated. The second argument is a scalar used to update the value frommemory; it must match the type of value to be updated. The final argumentis a mask value to exclude locations from being modified.

Semantics:

The ‘llvm.experimental.vector.histogram.*’ intrinsics are used to performupdates on potentially overlapping values in memory. The intrinsics representthe follow sequence of operations:

  1. Gather load from theptrs operand, with element type matching that oftheinc operand.

  2. Update of the values loaded from memory. In the case of theaddupdate operation, this means:

    1. Perform a cross-vector histogram operation on theptrs operand.

    2. Multiply the result by theinc operand.

    3. Add the result to the values loaded from memory

  3. Scatter the result of the update operation to the memory locations fromtheptrs operand.

Themask operand will apply to at least the gather and scatter operations.

llvm.experimental.vector.extract.last.active’ Intrinsic

This is an overloaded intrinsic.

declarei32@llvm.experimental.vector.extract.last.active.v4i32(<4xi32>%data,<4xi1>%mask,i32%passthru)declarei16@llvm.experimental.vector.extract.last.active.nxv8i16(<vscalex8xi16>%data,<vscalex8xi1>%mask,i16%passthru)
Arguments:

The first argument is the data vector to extract a lane from. The second is amask vector controlling the extraction. The third argument is a passthruvalue.

The two input vectors must have the same number of elements, and the type ofthe passthru value must match that of the elements of the data vector.

Semantics:

The ‘llvm.experimental.vector.extract.last.active’ intrinsic will extract anelement from the data vector at the index matching the highest active lane ofthe mask vector. If no mask lanes are active then the passthru value isreturned instead.

llvm.experimental.vector.compress.*’ Intrinsics

LLVM provides an intrinsic for compressing data within a vector based on a selection mask.Semantically, this is similar tollvm.masked.compressstore but with weaker assumptionsand without storing the results to memory, i.e., the data remains in the vector.

Syntax:

This is an overloaded intrinsic. A number of scalar values of integer, floating point or pointer data type are collectedfrom an input vector and placed adjacently within the result vector. A mask defines which elements to collect from the vector.The remaining lanes are filled with values frompassthru.

declare<8xi32>@llvm.experimental.vector.compress.v8i32(<8xi32><value>,<8xi1><mask>,<8xi32><passthru>)declare<16xfloat>@llvm.experimental.vector.compress.v16f32(<16xfloat><value>,<16xi1><mask>,<16xfloat>undef)
Overview:

Selects elements from input vectorvalue according to themask.All selected elements are written into adjacent lanes in the result vector,from lower to higher.The mask holds an entry for each vector lane, and is used to select elementsto be kept.If apassthru vector is given, all remaining lanes are filled with thecorresponding lane’s value frompassthru.The main difference tollvm.masked.compressstore isthat the we do not need to guard against memory access for unselected lanes.This allows for branchless code and better optimization for all targets thatdo not support or have inefficientinstructions of the explicit semantics ofllvm.masked.compressstore but still have some formof compress operations.The result vector can be written with a similar effect, as all the selectedvalues are at the lower positions of the vector, but without requiringbranches to avoid writes where the mask isfalse.

Arguments:

The first operand is the input vector, from which elements are selected.The second operand is the mask, a vector of boolean values.The third operand is the passthru vector, from which elements are filledinto remaining lanes.The mask and the input vector must have the same number of vector elements.The input and passthru vectors must have the same type.

Semantics:

Thellvm.experimental.vector.compress intrinsic compresses data within a vector.It collects elements from possibly non-adjacent lanes of a vector and placesthem contiguously in the result vector based on a selection mask, filling theremaining lanes with values frompassthru.This intrinsic performs the logic of the following C++ example.All values inout after the last selected one are undefined ifpassthru is undefined.If all entries in themask are 0, theout vector ispassthru.If any element of the mask is poison, all elements of the result are poison.Otherwise, if any element of the mask is undef, all elements of the result are undef.Ifpassthru is undefined, the number of valid lanes is equal to the numberoftrue entries in the mask, i.e., all lanes >= number-of-selected-valuesare undefined.

// Consecutively place selected values in a vector.usingVecT__attribute__((vector_size(N)))=int;VecTcompress(VecTvec,VecTmask,VecTpassthru){VecTout;intidx=0;for(inti=0;i<N/sizeof(int);++i){out[idx]=vec[i];idx+=static_cast<bool>(mask[i]);}for(;idx<N/sizeof(int);++idx){out[idx]=passthru[idx];}returnout;}

llvm.experimental.vector.match.*’ Intrinsic

Syntax:

This is an overloaded intrinsic.

declare<<n>xi1>@llvm.experimental.vector.match(<<n>x<ty>>%op1,<<m>x<ty>>%op2,<<n>xi1>%mask)declare<vscalex<n>xi1>@llvm.experimental.vector.match(<vscalex<n>x<ty>>%op1,<<m>x<ty>>%op2,<vscalex<n>xi1>%mask)
Overview:

Find active elements of the first argument matching any elements of the second.

Arguments:

The first argument is the search vector, the second argument the vector ofelements we are searching for (i.e. for which we consider a match successful),and the third argument is a mask that controls which elements of the firstargument are active. The first two arguments must be vectors of matchinginteger element types. The first and third arguments and the result type musthave matching element counts (fixed or scalable). The second argument must be afixed vector, but its length may be different from the remaining arguments.

Semantics:

The ‘llvm.experimental.vector.match’ intrinsic compares each active elementin the first argument against the elements of the second argument, placing1 in the corresponding element of the output vector if any equalitycomparison is successful, and0 otherwise. Inactive elements in the maskare set to0 in the output.

Matrix Intrinsics

Operations on matrixes requiring shape information (like number of rows/columnsor the memory layout) can be expressed using the matrix intrinsics. Theseintrinsics require matrix dimensions to be passed as immediate arguments, andmatrixes are passed and returned as vectors. This means that for aR xC matrix, elementi of columnj is at indexj*R+i in thecorresponding vector, with indices starting at 0. Currently column-major layoutis assumed. The intrinsics support both integer and floating point matrixes.

llvm.matrix.transpose.*’ Intrinsic

Syntax:

This is an overloaded intrinsic.

declarevectorty@llvm.matrix.transpose.*(vectorty%In,i32<Rows>,i32<Cols>)
Overview:

The ‘llvm.matrix.transpose.*’ intrinsics treat%In as a<Rows>x<Cols> matrix and return the transposed matrix in the result vector.

Arguments:

The first argument%In is a vector that corresponds to a<Rows>x<Cols> matrix. Thus, arguments<Rows> and<Cols> correspond to thenumber of rows and columns, respectively, and must be positive, constantintegers. The returned vector must have<Rows>*<Cols> elements, and havethe same float or integer element type as%In.

llvm.matrix.multiply.*’ Intrinsic

Syntax:

This is an overloaded intrinsic.

declarevectorty@llvm.matrix.multiply.*(vectorty%A,vectorty%B,i32<OuterRows>,i32<Inner>,i32<OuterColumns>)
Overview:

The ‘llvm.matrix.multiply.*’ intrinsics treat%A as a<OuterRows>x<Inner> matrix,%B as a<Inner>x<OuterColumns> matrix, andmultiplies them. The result matrix is returned in the result vector.

Arguments:

The first vector argument%A corresponds to a matrix with<OuterRows>*<Inner> elements, and the second argument%B to a matrix with<Inner>*<OuterColumns> elements. Arguments<OuterRows>,<Inner> and<OuterColumns> must be positive, constant integers. Thereturned vector must have<OuterRows>*<OuterColumns> elements.Vectors%A,%B, and the returned vector all have the same float orinteger element type.

llvm.matrix.column.major.load.*’ Intrinsic

Syntax:

This is an overloaded intrinsic.

declarevectorty@llvm.matrix.column.major.load.*(ptrty%Ptr,i64%Stride,i1<IsVolatile>,i32<Rows>,i32<Cols>)
Overview:

The ‘llvm.matrix.column.major.load.*’ intrinsics load a<Rows>x<Cols>matrix using a stride of%Stride to compute the start address of thedifferent columns. The offset is computed using%Stride’s bitwidth. Thisallows for convenient loading of sub matrixes. If<IsVolatile> is true, theintrinsic is considered avolatile memory access. The resultmatrix is returned in the result vector. If the%Ptr argument is known tobe aligned to some boundary, this can be specified as an attribute on theargument.

Arguments:

The first argument%Ptr is a pointer type to the returned vector type, andcorresponds to the start address to load from. The second argument%Strideis a positive, constant integer with%Stride>=<Rows>.%Stride is usedto compute the column memory addresses. I.e., for a columnC, its startmemory addresses is calculated with%Ptr+C*%Stride. The third Argument<IsVolatile> is a boolean value. The fourth and fifth arguments,<Rows> and<Cols>, correspond to the number of rows and columns,respectively, and must be positive, constant integers. The returned vector musthave<Rows>*<Cols> elements.

Thealign parameter attribute can be provided for the%Ptr arguments.

llvm.matrix.column.major.store.*’ Intrinsic

Syntax:
declarevoid@llvm.matrix.column.major.store.*(vectorty%In,ptrty%Ptr,i64%Stride,i1<IsVolatile>,i32<Rows>,i32<Cols>)
Overview:

The ‘llvm.matrix.column.major.store.*’ intrinsics store the<Rows>x<Cols> matrix in%In to memory using a stride of%Stride betweencolumns. The offset is computed using%Stride’s bitwidth. If<IsVolatile> is true, the intrinsic is considered avolatile memory access.

If the%Ptr argument is known to be aligned to some boundary, this can bespecified as an attribute on the argument.

Arguments:

The first argument%In is a vector that corresponds to a<Rows>x<Cols> matrix to be stored to memory. The second argument%Ptr is apointer to the vector type of%In, and is the start address of the matrixin memory. The third argument%Stride is a positive, constant integer with%Stride>=<Rows>.%Stride is used to compute the column memoryaddresses. I.e., for a columnC, its start memory addresses is calculatedwith%Ptr+C*%Stride. The fourth argument<IsVolatile> is a booleanvalue. The arguments<Rows> and<Cols> correspond to the number of rowsand columns, respectively, and must be positive, constant integers.

Thealign parameter attribute can be providedfor the%Ptr arguments.

Half Precision Floating-Point Intrinsics

For most target platforms, half precision floating-point is astorage-only format. This means that it is a dense encoding (in memory)but does not support computation in the format.

This means that code must first load the half-precision floating-pointvalue as an i16, then convert it to float withllvm.convert.from.fp16. Computation canthen be performed on the float value (including extending to doubleetc). To store the value back to memory, it is first converted to floatif needed, then converted to i16 withllvm.convert.to.fp16, then storing as ani16 value.

llvm.convert.to.fp16’ Intrinsic

Syntax:
declarei16@llvm.convert.to.fp16.f32(float%a)declarei16@llvm.convert.to.fp16.f64(double%a)
Overview:

The ‘llvm.convert.to.fp16’ intrinsic function performs a conversion from aconventional floating-point type to half precision floating-point format.

Arguments:

The intrinsic function contains single argument - the value to beconverted.

Semantics:

The ‘llvm.convert.to.fp16’ intrinsic function performs a conversion from aconventional floating-point format to half precision floating-point format. Thereturn value is ani16 which contains the converted number.

Examples:
%res=calli16@llvm.convert.to.fp16.f32(float%a)storei16%res,i16*@x,align2

llvm.convert.from.fp16’ Intrinsic

Syntax:
declarefloat@llvm.convert.from.fp16.f32(i16%a)declaredouble@llvm.convert.from.fp16.f64(i16%a)
Overview:

The ‘llvm.convert.from.fp16’ intrinsic function performs aconversion from half precision floating-point format to single precisionfloating-point format.

Arguments:

The intrinsic function contains single argument - the value to beconverted.

Semantics:

The ‘llvm.convert.from.fp16’ intrinsic function performs aconversion from half single precision floating-point format to singleprecision floating-point format. The input half-float value isrepresented by ani16 value.

Examples:
%a=loadi16,ptr@x,align2%res=callfloat@llvm.convert.from.fp16(i16%a)

Saturating floating-point to integer conversions

Thefptoui andfptosi instructions return apoison value if the rounded-towards-zero value is notrepresentable by the result type. These intrinsics provide an alternativeconversion, which will saturate towards the smallest and largest representableinteger values instead.

llvm.fptoui.sat.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.fptoui.sat on anyfloating-point argument type and any integer result type, or vectors thereof.Not all targets may support all types, however.

declarei32@llvm.fptoui.sat.i32.f32(float%f)declarei19@llvm.fptoui.sat.i19.f64(double%f)declare<4xi100>@llvm.fptoui.sat.v4i100.v4f128(<4xfp128>%f)
Overview:

This intrinsic converts the argument into an unsigned integer using saturatingsemantics.

Arguments:

The argument may be any floating-point or vector of floating-point type. Thereturn value may be any integer or vector of integer type. The number of vectorelements in argument and return must be the same.

Semantics:

The conversion to integer is performed subject to the following rules:

  • If the argument is any NaN, zero is returned.

  • If the argument is smaller than zero (this includes negative infinity),zero is returned.

  • If the argument is larger than the largest representable unsigned integer ofthe result type (this includes positive infinity), the largest representableunsigned integer is returned.

  • Otherwise, the result of rounding the argument towards zero is returned.

Example:
%a = call i8 @llvm.fptoui.sat.i8.f32(float 123.875)            ; yields i8: 123%b = call i8 @llvm.fptoui.sat.i8.f32(float -5.75)              ; yields i8:   0%c = call i8 @llvm.fptoui.sat.i8.f32(float 377.0)              ; yields i8: 255%d = call i8 @llvm.fptoui.sat.i8.f32(float 0xFFF8000000000000) ; yields i8:   0

llvm.fptosi.sat.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.fptosi.sat on anyfloating-point argument type and any integer result type, or vectors thereof.Not all targets may support all types, however.

declarei32@llvm.fptosi.sat.i32.f32(float%f)declarei19@llvm.fptosi.sat.i19.f64(double%f)declare<4xi100>@llvm.fptosi.sat.v4i100.v4f128(<4xfp128>%f)
Overview:

This intrinsic converts the argument into a signed integer using saturatingsemantics.

Arguments:

The argument may be any floating-point or vector of floating-point type. Thereturn value may be any integer or vector of integer type. The number of vectorelements in argument and return must be the same.

Semantics:

The conversion to integer is performed subject to the following rules:

  • If the argument is any NaN, zero is returned.

  • If the argument is smaller than the smallest representable signed integer ofthe result type (this includes negative infinity), the smallestrepresentable signed integer is returned.

  • If the argument is larger than the largest representable signed integer ofthe result type (this includes positive infinity), the largest representablesigned integer is returned.

  • Otherwise, the result of rounding the argument towards zero is returned.

Example:
%a = call i8 @llvm.fptosi.sat.i8.f32(float 23.875)             ; yields i8:   23%b = call i8 @llvm.fptosi.sat.i8.f32(float -130.75)            ; yields i8: -128%c = call i8 @llvm.fptosi.sat.i8.f32(float 999.0)              ; yields i8:  127%d = call i8 @llvm.fptosi.sat.i8.f32(float 0xFFF8000000000000) ; yields i8:    0

Convergence Intrinsics

The LLVM convergence intrinsics for controlling the semantics ofconvergentoperations, which all start with thellvm.experimental.convergence.prefix, are described in theConvergent Operation Semantics document.

Debugger Intrinsics

The LLVM debugger intrinsics (which all start withllvm.dbg.prefix), are described in theLLVM Source LevelDebuggingdocument.

Exception Handling Intrinsics

The LLVM exception handling intrinsics (which all start withllvm.eh. prefix), are described in theLLVM ExceptionHandling document.

Pointer Authentication Intrinsics

The LLVM pointer authentication intrinsics (which all start withllvm.ptrauth. prefix), are described in thePointer Authentication document.

Trampoline Intrinsics

These intrinsics make it possible to excise one parameter, marked withthenest attribute, from a function. The result is acallable function pointer lacking the nest parameter - the caller doesnot need to provide a value for it. Instead, the value to use is storedin advance in a “trampoline”, a block of memory usually allocated on thestack, which also contains code to splice the nest value into theargument list. This is used to implement the GCC nested function addressextension.

For example, if the function isi32f(ptrnest%c,i32%x,i32%y)then the resulting function pointer has signaturei32(i32,i32).It can be created as follows:

%tramp=alloca[10xi8],align4; size and alignment only correct for X86callptr@llvm.init.trampoline(ptr%tramp,ptr@f,ptr%nval)%fp=callptr@llvm.adjust.trampoline(ptr%tramp)

The call%val=calli32%fp(i32%x,i32%y) is then equivalent to%val=calli32%f(ptr%nval,i32%x,i32%y).

llvm.init.trampoline’ Intrinsic

Syntax:
declarevoid@llvm.init.trampoline(ptr<tramp>,ptr<func>,ptr<nval>)
Overview:

This fills the memory pointed to bytramp with executable code,turning it into a trampoline.

Arguments:

Thellvm.init.trampoline intrinsic takes three arguments, allpointers. Thetramp argument must point to a sufficiently large andsufficiently aligned block of memory; this memory is written to by theintrinsic. Note that the size and the alignment are target-specific -LLVM currently provides no portable way of determining them, so afront-end that generates this intrinsic needs to have sometarget-specific knowledge.

Thefunc argument must be a constant (potentially bitcasted) pointer to afunction declaration or definition, since the calling convention may affect thecontent of the trampoline that is created.

Semantics:

The block of memory pointed to bytramp is filled with targetdependent code, turning it into a function. Thentramp needs to bepassed tollvm.adjust.trampoline to get a pointer which canbebitcast (to a new function) and called. The newfunction’s signature is the same as that offunc with any argumentsmarked with thenest attribute removed. At most one suchnestargument is allowed, and it must be of pointer type. Calling the newfunction is equivalent to callingfunc with the same argument list,but withnval used for the missingnest argument. If, aftercallingllvm.init.trampoline, the memory pointed to bytramp ismodified, then the effect of any later call to the returned functionpointer is undefined.

llvm.adjust.trampoline’ Intrinsic

Syntax:
declareptr@llvm.adjust.trampoline(ptr<tramp>)
Overview:

This performs any required machine-specific adjustment to the address ofa trampoline (passed astramp).

Arguments:

tramp must point to a block of memory which already has trampolinecode filled in by a previous call tollvm.init.trampoline.

Semantics:

On some architectures the address of the code to be executed needs to bedifferent than the address where the trampoline is actually stored. Thisintrinsic returns the executable address corresponding totrampafter performing the required machine specific adjustments. The pointerreturned can then bebitcast and executed.

Vector Predication Intrinsics

VP intrinsics are intended for predicated SIMD/vector code. A typical VPoperation takes a vector mask and an explicit vector length parameter as in:

<WxT>llvm.vp.<opcode>.*(<WxT>%x,<WxT>%y,<Wxi1>%mask,i32%evl)

The vector mask parameter (%mask) always has a vector ofi1 type, for example<32 x i1>. The explicit vector length parameter always has the typei32 andis an unsigned integer value. The explicit vector length parameter (%evl) is inthe range:

0<=%evl<=W,whereWisthenumberofvectorelements

Note that forscalable vector typesW is the runtimelength of the vector.

The VP intrinsic has undefined behavior if%evl>W. The explicit vectorlength (%evl) creates a mask, %EVLmask, with all elements0<=i<%evl setto True, and all other lanes%evl<=i<W to False. A new mask %M iscalculated with an element-wise AND from %mask and %EVLmask:

M=%maskAND%EVLmask

A vector operation<opcode> on vectorsA andB calculates:

A<opcode>B={A[i]<opcode>B[i]M[i]=True,and{undefotherwise

Optimization Hint

Some targets, such as AVX512, do not support the %evl parameter in hardware.The use of an effective %evl is discouraged for those targets. The functionTargetTransformInfo::hasActiveVectorLength() returns true when the targethas native support for %evl.

llvm.vp.select.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.select.v16i32(<16xi1><condition>,<16xi32><on_true>,<16xi32><on_false>,i32<evl>)declare<vscalex4xi64>@llvm.vp.select.nxv4i64(<vscalex4xi1><condition>,<vscalex4xi64><on_true>,<vscalex4xi64><on_false>,i32<evl>)
Overview:

The ‘llvm.vp.select’ intrinsic is used to choose one value based on acondition vector, without IR-level branching.

Arguments:

The first argument is a vector ofi1 and indicates the condition. Thesecond argument is the value that is selected where the condition vector istrue. The third argument is the value that is selected where the conditionvector is false. The vectors must be of the same size. The fourth argument isthe explicit vector length.

  1. The optionalfast-mathflags marker indicates that the select has one ormorefast-math flags. These are optimization hints toenable otherwise unsafe floating-point optimizations. Fast-math flags areonly valid for selects that returnsupported floating-point types.

Semantics:

The intrinsic selects lanes from the second and third argument depending on acondition vector.

All result lanes at positions greater or equal than%evl are undefined.For all lanes below%evl where the condition vector is true the lane istaken from the second argument. Otherwise, the lane is taken from the thirdargument.

Example:
%r=call<4xi32>@llvm.vp.select.v4i32(<4xi1>%cond,<4xi32>%on_true,<4xi32>%on_false,i32%evl);;; Expansion.;; Any result is legal on lanes at and above %evl.%also.r=select<4xi1>%cond,<4xi32>%on_true,<4xi32>%on_false

llvm.vp.merge.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.merge.v16i32(<16xi1><condition>,<16xi32><on_true>,<16xi32><on_false>,i32<pivot>)declare<vscalex4xi64>@llvm.vp.merge.nxv4i64(<vscalex4xi1><condition>,<vscalex4xi64><on_true>,<vscalex4xi64><on_false>,i32<pivot>)
Overview:

The ‘llvm.vp.merge’ intrinsic is used to choose one value based on acondition vector and an index argument, without IR-level branching.

Arguments:

The first argument is a vector ofi1 and indicates the condition. Thesecond argument is the value that is merged where the condition vector is true.The third argument is the value that is selected where the condition vector isfalse or the lane position is greater equal than the pivot. The fourth argumentis the pivot.

  1. The optionalfast-mathflags marker indicates that the merge has one ormorefast-math flags. These are optimization hints toenable otherwise unsafe floating-point optimizations. Fast-math flags areonly valid for merges that returnsupported floating-point types.

Semantics:

The intrinsic selects lanes from the second and third argument depending on acondition vector and pivot value.

For all lanes where the condition vector is true and the lane position is lessthan%pivot the lane is taken from the second argument. Otherwise, the laneis taken from the third argument.

Example:
%r=call<4xi32>@llvm.vp.merge.v4i32(<4xi1>%cond,<4xi32>%on_true,<4xi32>%on_false,i32%pivot);;; Expansion.;; Lanes at and above %pivot are taken from %on_false%atfirst=insertelement<4xi32>poison,i32%pivot,i320%splat=shufflevector<4xi32>%atfirst,<4xi32>poison,<4xi32>zeroinitializer%pivotmask=icmpult<4xi32><i320,i321,i322,i323>,<4xi32>%splat%mergemask=and<4xi1>%cond,<4xi1>%pivotmask%also.r=select<4xi1>%mergemask,<4xi32>%on_true,<4xi32>%on_false

llvm.vp.add.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.add.v16i32(<16xi32><left_op>,<16xi32><right_op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi32>@llvm.vp.add.nxv4i32(<vscalex4xi32><left_op>,<vscalex4xi32><right_op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xi64>@llvm.vp.add.v256i64(<256xi64><left_op>,<256xi64><right_op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated integer addition of two vectors of integers.

Arguments:

The first two arguments and the result have the same vector of integer type. Thethird argument is the vector mask and has the same number of elements as theresult vector type. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.add’ intrinsic performs integer addition (add)of the first and second vector arguments on each enabled lane. The result ondisabled lanes is apoison value.

Examples:
%r=call<4xi32>@llvm.vp.add.v4i32(<4xi32>%a,<4xi32>%b,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=add<4xi32>%a,%b%also.r=select<4xi1>%mask,<4xi32>%t,<4xi32>poison

llvm.vp.sub.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.sub.v16i32(<16xi32><left_op>,<16xi32><right_op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi32>@llvm.vp.sub.nxv4i32(<vscalex4xi32><left_op>,<vscalex4xi32><right_op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xi64>@llvm.vp.sub.v256i64(<256xi64><left_op>,<256xi64><right_op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated integer subtraction of two vectors of integers.

Arguments:

The first two arguments and the result have the same vector of integer type. Thethird argument is the vector mask and has the same number of elements as theresult vector type. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.sub’ intrinsic performs integer subtraction(sub) of the first and second vector arguments on each enabledlane. The result on disabled lanes is apoison value.

Examples:
%r=call<4xi32>@llvm.vp.sub.v4i32(<4xi32>%a,<4xi32>%b,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=sub<4xi32>%a,%b%also.r=select<4xi1>%mask,<4xi32>%t,<4xi32>poison

llvm.vp.mul.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.mul.v16i32(<16xi32><left_op>,<16xi32><right_op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi32>@llvm.vp.mul.nxv46i32(<vscalex4xi32><left_op>,<vscalex4xi32><right_op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xi64>@llvm.vp.mul.v256i64(<256xi64><left_op>,<256xi64><right_op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated integer multiplication of two vectors of integers.

Arguments:

The first two arguments and the result have the same vector of integer type. Thethird argument is the vector mask and has the same number of elements as theresult vector type. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.mul’ intrinsic performs integer multiplication(mul) of the first and second vector arguments on each enabledlane. The result on disabled lanes is apoison value.

Examples:
%r=call<4xi32>@llvm.vp.mul.v4i32(<4xi32>%a,<4xi32>%b,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=mul<4xi32>%a,%b%also.r=select<4xi1>%mask,<4xi32>%t,<4xi32>poison

llvm.vp.sdiv.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.sdiv.v16i32(<16xi32><left_op>,<16xi32><right_op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi32>@llvm.vp.sdiv.nxv4i32(<vscalex4xi32><left_op>,<vscalex4xi32><right_op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xi64>@llvm.vp.sdiv.v256i64(<256xi64><left_op>,<256xi64><right_op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated, signed division of two vectors of integers.

Arguments:

The first two arguments and the result have the same vector of integer type. Thethird argument is the vector mask and has the same number of elements as theresult vector type. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.sdiv’ intrinsic performs signed division (sdiv)of the first and second vector arguments on each enabled lane. The result ondisabled lanes is apoison value.

Examples:
%r=call<4xi32>@llvm.vp.sdiv.v4i32(<4xi32>%a,<4xi32>%b,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=sdiv<4xi32>%a,%b%also.r=select<4xi1>%mask,<4xi32>%t,<4xi32>poison

llvm.vp.udiv.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.udiv.v16i32(<16xi32><left_op>,<16xi32><right_op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi32>@llvm.vp.udiv.nxv4i32(<vscalex4xi32><left_op>,<vscalex4xi32><right_op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xi64>@llvm.vp.udiv.v256i64(<256xi64><left_op>,<256xi64><right_op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated, unsigned division of two vectors of integers.

Arguments:

The first two arguments and the result have the same vector of integer type. Thethird argument is the vector mask and has the same number of elements as theresult vector type. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.udiv’ intrinsic performs unsigned division(udiv) of the first and second vector arguments on each enabledlane. The result on disabled lanes is apoison value.

Examples:
%r=call<4xi32>@llvm.vp.udiv.v4i32(<4xi32>%a,<4xi32>%b,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=udiv<4xi32>%a,%b%also.r=select<4xi1>%mask,<4xi32>%t,<4xi32>poison

llvm.vp.srem.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.srem.v16i32(<16xi32><left_op>,<16xi32><right_op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi32>@llvm.vp.srem.nxv4i32(<vscalex4xi32><left_op>,<vscalex4xi32><right_op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xi64>@llvm.vp.srem.v256i64(<256xi64><left_op>,<256xi64><right_op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated computations of the signed remainder of two integer vectors.

Arguments:

The first two arguments and the result have the same vector of integer type. Thethird argument is the vector mask and has the same number of elements as theresult vector type. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.srem’ intrinsic computes the remainder of the signed division(srem) of the first and second vector arguments on each enabledlane. The result on disabled lanes is apoison value.

Examples:
%r=call<4xi32>@llvm.vp.srem.v4i32(<4xi32>%a,<4xi32>%b,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=srem<4xi32>%a,%b%also.r=select<4xi1>%mask,<4xi32>%t,<4xi32>poison

llvm.vp.urem.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.urem.v16i32(<16xi32><left_op>,<16xi32><right_op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi32>@llvm.vp.urem.nxv4i32(<vscalex4xi32><left_op>,<vscalex4xi32><right_op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xi64>@llvm.vp.urem.v256i64(<256xi64><left_op>,<256xi64><right_op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated computation of the unsigned remainder of two integer vectors.

Arguments:

The first two arguments and the result have the same vector of integer type. Thethird argument is the vector mask and has the same number of elements as theresult vector type. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.urem’ intrinsic computes the remainder of the unsigned division(urem) of the first and second vector arguments on each enabledlane. The result on disabled lanes is apoison value.

Examples:
%r=call<4xi32>@llvm.vp.urem.v4i32(<4xi32>%a,<4xi32>%b,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=urem<4xi32>%a,%b%also.r=select<4xi1>%mask,<4xi32>%t,<4xi32>poison

llvm.vp.ashr.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.ashr.v16i32(<16xi32><left_op>,<16xi32><right_op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi32>@llvm.vp.ashr.nxv4i32(<vscalex4xi32><left_op>,<vscalex4xi32><right_op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xi64>@llvm.vp.ashr.v256i64(<256xi64><left_op>,<256xi64><right_op>,<256xi1><mask>,i32<vector_length>)
Overview:

Vector-predicated arithmetic right-shift.

Arguments:

The first two arguments and the result have the same vector of integer type. Thethird argument is the vector mask and has the same number of elements as theresult vector type. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.ashr’ intrinsic computes the arithmetic right shift(ashr) of the first argument by the second argument on eachenabled lane. The result on disabled lanes is apoison value.

Examples:
%r=call<4xi32>@llvm.vp.ashr.v4i32(<4xi32>%a,<4xi32>%b,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=ashr<4xi32>%a,%b%also.r=select<4xi1>%mask,<4xi32>%t,<4xi32>poison

llvm.vp.lshr.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.lshr.v16i32(<16xi32><left_op>,<16xi32><right_op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi32>@llvm.vp.lshr.nxv4i32(<vscalex4xi32><left_op>,<vscalex4xi32><right_op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xi64>@llvm.vp.lshr.v256i64(<256xi64><left_op>,<256xi64><right_op>,<256xi1><mask>,i32<vector_length>)
Overview:

Vector-predicated logical right-shift.

Arguments:

The first two arguments and the result have the same vector of integer type. Thethird argument is the vector mask and has the same number of elements as theresult vector type. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.lshr’ intrinsic computes the logical right shift(lshr) of the first argument by the second argument on eachenabled lane. The result on disabled lanes is apoison value.

Examples:
%r=call<4xi32>@llvm.vp.lshr.v4i32(<4xi32>%a,<4xi32>%b,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=lshr<4xi32>%a,%b%also.r=select<4xi1>%mask,<4xi32>%t,<4xi32>poison

llvm.vp.shl.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.shl.v16i32(<16xi32><left_op>,<16xi32><right_op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi32>@llvm.vp.shl.nxv4i32(<vscalex4xi32><left_op>,<vscalex4xi32><right_op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xi64>@llvm.vp.shl.v256i64(<256xi64><left_op>,<256xi64><right_op>,<256xi1><mask>,i32<vector_length>)
Overview:

Vector-predicated left shift.

Arguments:

The first two arguments and the result have the same vector of integer type. Thethird argument is the vector mask and has the same number of elements as theresult vector type. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.shl’ intrinsic computes the left shift (shl) ofthe first argument by the second argument on each enabled lane. The result ondisabled lanes is apoison value.

Examples:
%r=call<4xi32>@llvm.vp.shl.v4i32(<4xi32>%a,<4xi32>%b,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=shl<4xi32>%a,%b%also.r=select<4xi1>%mask,<4xi32>%t,<4xi32>poison

llvm.vp.or.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.or.v16i32(<16xi32><left_op>,<16xi32><right_op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi32>@llvm.vp.or.nxv4i32(<vscalex4xi32><left_op>,<vscalex4xi32><right_op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xi64>@llvm.vp.or.v256i64(<256xi64><left_op>,<256xi64><right_op>,<256xi1><mask>,i32<vector_length>)
Overview:

Vector-predicated or.

Arguments:

The first two arguments and the result have the same vector of integer type. Thethird argument is the vector mask and has the same number of elements as theresult vector type. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.or’ intrinsic performs a bitwise or (or) of thefirst two arguments on each enabled lane. The result on disabled lanes isapoison value.

Examples:
%r=call<4xi32>@llvm.vp.or.v4i32(<4xi32>%a,<4xi32>%b,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=or<4xi32>%a,%b%also.r=select<4xi1>%mask,<4xi32>%t,<4xi32>poison

llvm.vp.and.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.and.v16i32(<16xi32><left_op>,<16xi32><right_op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi32>@llvm.vp.and.nxv4i32(<vscalex4xi32><left_op>,<vscalex4xi32><right_op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xi64>@llvm.vp.and.v256i64(<256xi64><left_op>,<256xi64><right_op>,<256xi1><mask>,i32<vector_length>)
Overview:

Vector-predicated and.

Arguments:

The first two arguments and the result have the same vector of integer type. Thethird argument is the vector mask and has the same number of elements as theresult vector type. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.and’ intrinsic performs a bitwise and (and) ofthe first two arguments on each enabled lane. The result on disabled lanes isapoison value.

Examples:
%r=call<4xi32>@llvm.vp.and.v4i32(<4xi32>%a,<4xi32>%b,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=and<4xi32>%a,%b%also.r=select<4xi1>%mask,<4xi32>%t,<4xi32>poison

llvm.vp.xor.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.xor.v16i32(<16xi32><left_op>,<16xi32><right_op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi32>@llvm.vp.xor.nxv4i32(<vscalex4xi32><left_op>,<vscalex4xi32><right_op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xi64>@llvm.vp.xor.v256i64(<256xi64><left_op>,<256xi64><right_op>,<256xi1><mask>,i32<vector_length>)
Overview:

Vector-predicated, bitwise xor.

Arguments:

The first two arguments and the result have the same vector of integer type. Thethird argument is the vector mask and has the same number of elements as theresult vector type. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.xor’ intrinsic performs a bitwise xor (xor) ofthe first two arguments on each enabled lane.The result on disabled lanes is apoison value.

Examples:
%r=call<4xi32>@llvm.vp.xor.v4i32(<4xi32>%a,<4xi32>%b,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=xor<4xi32>%a,%b%also.r=select<4xi1>%mask,<4xi32>%t,<4xi32>poison

llvm.vp.abs.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.abs.v16i32(<16xi32><op>,i1<is_int_min_poison>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi32>@llvm.vp.abs.nxv4i32(<vscalex4xi32><op>,i1<is_int_min_poison>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xi64>@llvm.vp.abs.v256i64(<256xi64><op>,i1<is_int_min_poison>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated abs of a vector of integers.

Arguments:

The first argument and the result have the same vector of integer type. Thesecond argument must be a constant and is a flag to indicate whether the resultvalue of the ‘llvm.vp.abs’ intrinsic is apoison valueif the first argument is statically or dynamically anINT_MIN value. Thethird argument is the vector mask and has the same number of elements as theresult vector type. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.abs’ intrinsic performs abs (abs) of the first argument on eachenabled lane. The result on disabled lanes is apoison value.

Examples:
%r=call<4xi32>@llvm.vp.abs.v4i32(<4xi32>%a,i1false,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=call<4xi32>@llvm.abs.v4i32(<4xi32>%a,i1false)%also.r=select<4xi1>%mask,<4xi32>%t,<4xi32>poison

llvm.vp.smax.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.smax.v16i32(<16xi32><left_op>,<16xi32><right_op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi32>@llvm.vp.smax.nxv4i32(<vscalex4xi32><left_op>,<vscalex4xi32><right_op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xi64>@llvm.vp.smax.v256i64(<256xi64><left_op>,<256xi64><right_op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated integer signed maximum of two vectors of integers.

Arguments:

The first two arguments and the result have the same vector of integer type. Thethird argument is the vector mask and has the same number of elements as theresult vector type. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.smax’ intrinsic performs integer signed maximum (smax)of the first and second vector arguments on each enabled lane. The result ondisabled lanes is apoison value.

Examples:
%r=call<4xi32>@llvm.vp.smax.v4i32(<4xi32>%a,<4xi32>%b,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=call<4xi32>@llvm.smax.v4i32(<4xi32>%a,<4xi32>%b)%also.r=select<4xi1>%mask,<4xi32>%t,<4xi32>poison

llvm.vp.smin.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.smin.v16i32(<16xi32><left_op>,<16xi32><right_op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi32>@llvm.vp.smin.nxv4i32(<vscalex4xi32><left_op>,<vscalex4xi32><right_op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xi64>@llvm.vp.smin.v256i64(<256xi64><left_op>,<256xi64><right_op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated integer signed minimum of two vectors of integers.

Arguments:

The first two arguments and the result have the same vector of integer type. Thethird argument is the vector mask and has the same number of elements as theresult vector type. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.smin’ intrinsic performs integer signed minimum (smin)of the first and second vector arguments on each enabled lane. The result ondisabled lanes is apoison value.

Examples:
%r=call<4xi32>@llvm.vp.smin.v4i32(<4xi32>%a,<4xi32>%b,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=call<4xi32>@llvm.smin.v4i32(<4xi32>%a,<4xi32>%b)%also.r=select<4xi1>%mask,<4xi32>%t,<4xi32>poison

llvm.vp.umax.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.umax.v16i32(<16xi32><left_op>,<16xi32><right_op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi32>@llvm.vp.umax.nxv4i32(<vscalex4xi32><left_op>,<vscalex4xi32><right_op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xi64>@llvm.vp.umax.v256i64(<256xi64><left_op>,<256xi64><right_op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated integer unsigned maximum of two vectors of integers.

Arguments:

The first two arguments and the result have the same vector of integer type. Thethird argument is the vector mask and has the same number of elements as theresult vector type. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.umax’ intrinsic performs integer unsigned maximum (umax)of the first and second vector arguments on each enabled lane. The result ondisabled lanes is apoison value.

Examples:
%r=call<4xi32>@llvm.vp.umax.v4i32(<4xi32>%a,<4xi32>%b,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=call<4xi32>@llvm.umax.v4i32(<4xi32>%a,<4xi32>%b)%also.r=select<4xi1>%mask,<4xi32>%t,<4xi32>poison

llvm.vp.umin.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.umin.v16i32(<16xi32><left_op>,<16xi32><right_op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi32>@llvm.vp.umin.nxv4i32(<vscalex4xi32><left_op>,<vscalex4xi32><right_op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xi64>@llvm.vp.umin.v256i64(<256xi64><left_op>,<256xi64><right_op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated integer unsigned minimum of two vectors of integers.

Arguments:

The first two arguments and the result have the same vector of integer type. Thethird argument is the vector mask and has the same number of elements as theresult vector type. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.umin’ intrinsic performs integer unsigned minimum (umin)of the first and second vector arguments on each enabled lane. The result ondisabled lanes is apoison value.

Examples:
%r=call<4xi32>@llvm.vp.umin.v4i32(<4xi32>%a,<4xi32>%b,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=call<4xi32>@llvm.umin.v4i32(<4xi32>%a,<4xi32>%b)%also.r=select<4xi1>%mask,<4xi32>%t,<4xi32>poison

llvm.vp.copysign.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xfloat>@llvm.vp.copysign.v16f32(<16xfloat><mag_op>,<16xfloat><sign_op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xfloat>@llvm.vp.copysign.nxv4f32(<vscalex4xfloat><mag_op>,<vscalex4xfloat><sign_op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xdouble>@llvm.vp.copysign.v256f64(<256xdouble><mag_op>,<256xdouble><sign_op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated floating-point copysign of two vectors of floating-point values.

Arguments:

The first two arguments and the result have the same vector of floating-point type. Thethird argument is the vector mask and has the same number of elements as theresult vector type. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.copysign’ intrinsic performs floating-point copysign (copysign)of the first and second vector arguments on each enabled lane. The result ondisabled lanes is apoison value. The operation isperformed in the default floating-point environment.

Examples:
%r=call<4xfloat>@llvm.vp.copysign.v4f32(<4xfloat>%mag,<4xfloat>%sign,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=call<4xfloat>@llvm.copysign.v4f32(<4xfloat>%mag,<4xfloat>%sign)%also.r=select<4xi1>%mask,<4xfloat>%t,<4xfloat>poison

llvm.vp.minnum.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xfloat>@llvm.vp.minnum.v16f32(<16xfloat><left_op>,<16xfloat><right_op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xfloat>@llvm.vp.minnum.nxv4f32(<vscalex4xfloat><left_op>,<vscalex4xfloat><right_op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xdouble>@llvm.vp.minnum.v256f64(<256xdouble><left_op>,<256xdouble><right_op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated floating-point IEEE-754-2008 minNum of two vectors of floating-point values.

Arguments:

The first two arguments and the result have the same vector of floating-point type. Thethird argument is the vector mask and has the same number of elements as theresult vector type. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.minnum’ intrinsic performs floating-point minimum (minnum)of the first and second vector arguments on each enabled lane. The result ondisabled lanes is apoison value. The operation isperformed in the default floating-point environment.

Examples:
%r=call<4xfloat>@llvm.vp.minnum.v4f32(<4xfloat>%a,<4xfloat>%b,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=call<4xfloat>@llvm.minnum.v4f32(<4xfloat>%a,<4xfloat>%b)%also.r=select<4xi1>%mask,<4xfloat>%t,<4xfloat>poison

llvm.vp.maxnum.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xfloat>@llvm.vp.maxnum.v16f32(<16xfloat><left_op>,<16xfloat><right_op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xfloat>@llvm.vp.maxnum.nxv4f32(<vscalex4xfloat><left_op>,<vscalex4xfloat><right_op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xdouble>@llvm.vp.maxnum.v256f64(<256xdouble><left_op>,<256xdouble><right_op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated floating-point IEEE-754-2008 maxNum of two vectors of floating-point values.

Arguments:

The first two arguments and the result have the same vector of floating-point type. Thethird argument is the vector mask and has the same number of elements as theresult vector type. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.maxnum’ intrinsic performs floating-point maximum (maxnum)of the first and second vector arguments on each enabled lane. The result ondisabled lanes is apoison value. The operation isperformed in the default floating-point environment.

Examples:
%r=call<4xfloat>@llvm.vp.maxnum.v4f32(<4xfloat>%a,<4xfloat>%b,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=call<4xfloat>@llvm.maxnum.v4f32(<4xfloat>%a,<4xfloat>%b,<4xi1>%mask,i32%evl)%also.r=select<4xi1>%mask,<4xfloat>%t,<4xfloat>poison

llvm.vp.minimum.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xfloat>@llvm.vp.minimum.v16f32(<16xfloat><left_op>,<16xfloat><right_op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xfloat>@llvm.vp.minimum.nxv4f32(<vscalex4xfloat><left_op>,<vscalex4xfloat><right_op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xdouble>@llvm.vp.minimum.v256f64(<256xdouble><left_op>,<256xdouble><right_op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated floating-point minimum of two vectors of floating-point values,propagating NaNs and treating -0.0 as less than +0.0.

Arguments:

The first two arguments and the result have the same vector of floating-point type. Thethird argument is the vector mask and has the same number of elements as theresult vector type. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.minimum’ intrinsic performs floating-point minimum (minimum)of the first and second vector arguments on each enabled lane, the result beingNaN if either argument is a NaN. -0.0 is considered to be less than +0.0 for thisintrinsic. The result on disabled lanes is apoison value.The operation is performed in the default floating-point environment.

Examples:
%r=call<4xfloat>@llvm.vp.minimum.v4f32(<4xfloat>%a,<4xfloat>%b,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=call<4xfloat>@llvm.minimum.v4f32(<4xfloat>%a,<4xfloat>%b)%also.r=select<4xi1>%mask,<4xfloat>%t,<4xfloat>poison

llvm.vp.maximum.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xfloat>@llvm.vp.maximum.v16f32(<16xfloat><left_op>,<16xfloat><right_op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xfloat>@llvm.vp.maximum.nxv4f32(<vscalex4xfloat><left_op>,<vscalex4xfloat><right_op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xdouble>@llvm.vp.maximum.v256f64(<256xdouble><left_op>,<256xdouble><right_op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated floating-point maximum of two vectors of floating-point values,propagating NaNs and treating -0.0 as less than +0.0.

Arguments:

The first two arguments and the result have the same vector of floating-point type. Thethird argument is the vector mask and has the same number of elements as theresult vector type. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.maximum’ intrinsic performs floating-point maximum (maximum)of the first and second vector arguments on each enabled lane, the result beingNaN if either argument is a NaN. -0.0 is considered to be less than +0.0 for thisintrinsic. The result on disabled lanes is apoison value.The operation is performed in the default floating-point environment.

Examples:
%r=call<4xfloat>@llvm.vp.maximum.v4f32(<4xfloat>%a,<4xfloat>%b,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=call<4xfloat>@llvm.maximum.v4f32(<4xfloat>%a,<4xfloat>%b,<4xi1>%mask,i32%evl)%also.r=select<4xi1>%mask,<4xfloat>%t,<4xfloat>poison

llvm.vp.fadd.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xfloat>@llvm.vp.fadd.v16f32(<16xfloat><left_op>,<16xfloat><right_op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xfloat>@llvm.vp.fadd.nxv4f32(<vscalex4xfloat><left_op>,<vscalex4xfloat><right_op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xdouble>@llvm.vp.fadd.v256f64(<256xdouble><left_op>,<256xdouble><right_op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated floating-point addition of two vectors of floating-point values.

Arguments:

The first two arguments and the result have the same vector of floating-point type. Thethird argument is the vector mask and has the same number of elements as theresult vector type. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.fadd’ intrinsic performs floating-point addition (fadd)of the first and second vector arguments on each enabled lane. The result ondisabled lanes is apoison value. The operation isperformed in the default floating-point environment.

Examples:
%r=call<4xfloat>@llvm.vp.fadd.v4f32(<4xfloat>%a,<4xfloat>%b,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=fadd<4xfloat>%a,%b%also.r=select<4xi1>%mask,<4xfloat>%t,<4xfloat>poison

llvm.vp.fsub.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xfloat>@llvm.vp.fsub.v16f32(<16xfloat><left_op>,<16xfloat><right_op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xfloat>@llvm.vp.fsub.nxv4f32(<vscalex4xfloat><left_op>,<vscalex4xfloat><right_op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xdouble>@llvm.vp.fsub.v256f64(<256xdouble><left_op>,<256xdouble><right_op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated floating-point subtraction of two vectors of floating-point values.

Arguments:

The first two arguments and the result have the same vector of floating-point type. Thethird argument is the vector mask and has the same number of elements as theresult vector type. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.fsub’ intrinsic performs floating-point subtraction (fsub)of the first and second vector arguments on each enabled lane. The result ondisabled lanes is apoison value. The operation isperformed in the default floating-point environment.

Examples:
%r=call<4xfloat>@llvm.vp.fsub.v4f32(<4xfloat>%a,<4xfloat>%b,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=fsub<4xfloat>%a,%b%also.r=select<4xi1>%mask,<4xfloat>%t,<4xfloat>poison

llvm.vp.fmul.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xfloat>@llvm.vp.fmul.v16f32(<16xfloat><left_op>,<16xfloat><right_op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xfloat>@llvm.vp.fmul.nxv4f32(<vscalex4xfloat><left_op>,<vscalex4xfloat><right_op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xdouble>@llvm.vp.fmul.v256f64(<256xdouble><left_op>,<256xdouble><right_op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated floating-point multiplication of two vectors of floating-point values.

Arguments:

The first two arguments and the result have the same vector of floating-point type. Thethird argument is the vector mask and has the same number of elements as theresult vector type. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.fmul’ intrinsic performs floating-point multiplication (fmul)of the first and second vector arguments on each enabled lane. The result ondisabled lanes is apoison value. The operation isperformed in the default floating-point environment.

Examples:
%r=call<4xfloat>@llvm.vp.fmul.v4f32(<4xfloat>%a,<4xfloat>%b,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=fmul<4xfloat>%a,%b%also.r=select<4xi1>%mask,<4xfloat>%t,<4xfloat>poison

llvm.vp.fdiv.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xfloat>@llvm.vp.fdiv.v16f32(<16xfloat><left_op>,<16xfloat><right_op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xfloat>@llvm.vp.fdiv.nxv4f32(<vscalex4xfloat><left_op>,<vscalex4xfloat><right_op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xdouble>@llvm.vp.fdiv.v256f64(<256xdouble><left_op>,<256xdouble><right_op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated floating-point division of two vectors of floating-point values.

Arguments:

The first two arguments and the result have the same vector of floating-point type. Thethird argument is the vector mask and has the same number of elements as theresult vector type. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.fdiv’ intrinsic performs floating-point division (fdiv)of the first and second vector arguments on each enabled lane. The result ondisabled lanes is apoison value. The operation isperformed in the default floating-point environment.

Examples:
%r=call<4xfloat>@llvm.vp.fdiv.v4f32(<4xfloat>%a,<4xfloat>%b,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=fdiv<4xfloat>%a,%b%also.r=select<4xi1>%mask,<4xfloat>%t,<4xfloat>poison

llvm.vp.frem.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xfloat>@llvm.vp.frem.v16f32(<16xfloat><left_op>,<16xfloat><right_op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xfloat>@llvm.vp.frem.nxv4f32(<vscalex4xfloat><left_op>,<vscalex4xfloat><right_op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xdouble>@llvm.vp.frem.v256f64(<256xdouble><left_op>,<256xdouble><right_op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated floating-point remainder of two vectors of floating-point values.

Arguments:

The first two arguments and the result have the same vector of floating-point type. Thethird argument is the vector mask and has the same number of elements as theresult vector type. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.frem’ intrinsic performs floating-point remainder (frem)of the first and second vector arguments on each enabled lane. The result ondisabled lanes is apoison value. The operation isperformed in the default floating-point environment.

Examples:
%r=call<4xfloat>@llvm.vp.frem.v4f32(<4xfloat>%a,<4xfloat>%b,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=frem<4xfloat>%a,%b%also.r=select<4xi1>%mask,<4xfloat>%t,<4xfloat>poison

llvm.vp.fneg.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xfloat>@llvm.vp.fneg.v16f32(<16xfloat><op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xfloat>@llvm.vp.fneg.nxv4f32(<vscalex4xfloat><op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xdouble>@llvm.vp.fneg.v256f64(<256xdouble><op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated floating-point negation of a vector of floating-point values.

Arguments:

The first argument and the result have the same vector of floating-point type.The second argument is the vector mask and has the same number of elements as theresult vector type. The third argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.fneg’ intrinsic performs floating-point negation (fneg)of the first vector argument on each enabled lane. The result on disabled lanesis apoison value.

Examples:
%r=call<4xfloat>@llvm.vp.fneg.v4f32(<4xfloat>%a,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=fneg<4xfloat>%a%also.r=select<4xi1>%mask,<4xfloat>%t,<4xfloat>poison

llvm.vp.fabs.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xfloat>@llvm.vp.fabs.v16f32(<16xfloat><op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xfloat>@llvm.vp.fabs.nxv4f32(<vscalex4xfloat><op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xdouble>@llvm.vp.fabs.v256f64(<256xdouble><op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated floating-point absolute value of a vector of floating-point values.

Arguments:

The first argument and the result have the same vector of floating-point type.The second argument is the vector mask and has the same number of elements as theresult vector type. The third argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.fabs’ intrinsic performs floating-point absolute value(fabs) of the first vector argument on each enabled lane. Theresult on disabled lanes is apoison value.

Examples:
%r=call<4xfloat>@llvm.vp.fabs.v4f32(<4xfloat>%a,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=call<4xfloat>@llvm.fabs.v4f32(<4xfloat>%a)%also.r=select<4xi1>%mask,<4xfloat>%t,<4xfloat>poison

llvm.vp.sqrt.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xfloat>@llvm.vp.sqrt.v16f32(<16xfloat><op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xfloat>@llvm.vp.sqrt.nxv4f32(<vscalex4xfloat><op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xdouble>@llvm.vp.sqrt.v256f64(<256xdouble><op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated floating-point square root of a vector of floating-point values.

Arguments:

The first argument and the result have the same vector of floating-point type.The second argument is the vector mask and has the same number of elements as theresult vector type. The third argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.sqrt’ intrinsic performs floating-point square root (sqrt) ofthe first vector argument on each enabled lane. The result on disabled lanes isapoison value. The operation is performed in the defaultfloating-point environment.

Examples:
%r=call<4xfloat>@llvm.vp.sqrt.v4f32(<4xfloat>%a,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=call<4xfloat>@llvm.sqrt.v4f32(<4xfloat>%a)%also.r=select<4xi1>%mask,<4xfloat>%t,<4xfloat>poison

llvm.vp.fma.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xfloat>@llvm.vp.fma.v16f32(<16xfloat><left_op>,<16xfloat><middle_op>,<16xfloat><right_op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xfloat>@llvm.vp.fma.nxv4f32(<vscalex4xfloat><left_op>,<vscalex4xfloat><middle_op>,<vscalex4xfloat><right_op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xdouble>@llvm.vp.fma.v256f64(<256xdouble><left_op>,<256xdouble><middle_op>,<256xdouble><right_op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated floating-point fused multiply-add of two vectors of floating-point values.

Arguments:

The first three arguments and the result have the same vector of floating-point type. Thefourth argument is the vector mask and has the same number of elements as theresult vector type. The fifth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.fma’ intrinsic performs floating-point fused multiply-add (llvm.fma)of the first, second, and third vector argument on each enabled lane. The result ondisabled lanes is apoison value. The operation isperformed in the default floating-point environment.

Examples:
%r=call<4xfloat>@llvm.vp.fma.v4f32(<4xfloat>%a,<4xfloat>%b,<4xfloat>%c,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=call<4xfloat>@llvm.fma(<4xfloat>%a,<4xfloat>%b,<4xfloat>%c)%also.r=select<4xi1>%mask,<4xfloat>%t,<4xfloat>poison

llvm.vp.fmuladd.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xfloat>@llvm.vp.fmuladd.v16f32(<16xfloat><left_op>,<16xfloat><middle_op>,<16xfloat><right_op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xfloat>@llvm.vp.fmuladd.nxv4f32(<vscalex4xfloat><left_op>,<vscalex4xfloat><middle_op>,<vscalex4xfloat><right_op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xdouble>@llvm.vp.fmuladd.v256f64(<256xdouble><left_op>,<256xdouble><middle_op>,<256xdouble><right_op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated floating-point multiply-add of two vectors of floating-point valuesthat can be fused if code generator determines that (a) the target instructionset has support for a fused operation, and (b) that the fused operation is moreefficient than the equivalent, separate pair of mul and add instructions.

Arguments:

The first three arguments and the result have the same vector of floating-pointtype. The fourth argument is the vector mask and has the same number of elementsas the result vector type. The fifth argument is the explicit vector length ofthe operation.

Semantics:

The ‘llvm.vp.fmuladd’ intrinsic performs floating-point multiply-add (llvm.fuladd)of the first, second, and third vector argument on each enabled lane. The resulton disabled lanes is apoison value. The operation isperformed in the default floating-point environment.

Examples:
%r=call<4xfloat>@llvm.vp.fmuladd.v4f32(<4xfloat>%a,<4xfloat>%b,<4xfloat>%c,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=call<4xfloat>@llvm.fmuladd(<4xfloat>%a,<4xfloat>%b,<4xfloat>%c)%also.r=select<4xi1>%mask,<4xfloat>%t,<4xfloat>poison

llvm.vp.reduce.add.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declarei32@llvm.vp.reduce.add.v4i32(i32<start_value>,<4xi32><val>,<4xi1><mask>,i32<vector_length>)declarei16@llvm.vp.reduce.add.nxv8i16(i16<start_value>,<vscalex8xi16><val>,<vscalex8xi1><mask>,i32<vector_length>)
Overview:

Predicated integerADD reduction of a vector and a scalar starting value,returning the result as a scalar.

Arguments:

The first argument is the start value of the reduction, which must be a scalarinteger type equal to the result type. The second argument is the vector onwhich the reduction is performed and must be a vector of integer values whoseelement type is the result/start type. The third argument is the vector mask andis a vector of boolean values with the same number of elements as the vectorargument. The fourth argument is the explicit vector length of the operation.

Semantics:

The ‘llvm.vp.reduce.add’ intrinsic performs the integerADD reduction(llvm.vector.reduce.add) of the vector argumentval on each enabled lane, adding it to the scalarstart_value. Disabledlanes are treated as containing the neutral value0 (i.e. having no effecton the reduction operation). If the vector length is zero, the result is equaltostart_value.

To ignore the start value, the neutral value can be used.

Examples:
%r=calli32@llvm.vp.reduce.add.v4i32(i32%start,<4xi32>%a,<4xi1>%mask,i32%evl); %r is equivalent to %also.r, where lanes greater than or equal to %evl; are treated as though %mask were false for those lanes.%masked.a=select<4xi1>%mask,<4xi32>%a,<4xi32>zeroinitializer%reduction=calli32@llvm.vector.reduce.add.v4i32(<4xi32>%masked.a)%also.r=addi32%reduction,%start

llvm.vp.reduce.fadd.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declarefloat@llvm.vp.reduce.fadd.v4f32(float<start_value>,<4xfloat><val>,<4xi1><mask>,i32<vector_length>)declaredouble@llvm.vp.reduce.fadd.nxv8f64(double<start_value>,<vscalex8xdouble><val>,<vscalex8xi1><mask>,i32<vector_length>)
Overview:

Predicated floating-pointADD reduction of a vector and a scalar startingvalue, returning the result as a scalar.

Arguments:

The first argument is the start value of the reduction, which must be a scalarfloating-point type equal to the result type. The second argument is the vectoron which the reduction is performed and must be a vector of floating-pointvalues whose element type is the result/start type. The third argument is thevector mask and is a vector of boolean values with the same number of elementsas the vector argument. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.reduce.fadd’ intrinsic performs the floating-pointADDreduction (llvm.vector.reduce.fadd) of thevector argumentval on each enabled lane, adding it to the scalarstart_value. Disabled lanes are treated as containing the neutral value-0.0 (i.e. having no effect on the reduction operation). If no lanes areenabled, the resulting value will be equal tostart_value.

To ignore the start value, the neutral value can be used.

See the unpredicated version (llvm.vector.reduce.fadd) for more detail on the semantics of the reduction.

Examples:
%r=callfloat@llvm.vp.reduce.fadd.v4f32(float%start,<4xfloat>%a,<4xi1>%mask,i32%evl); %r is equivalent to %also.r, where lanes greater than or equal to %evl; are treated as though %mask were false for those lanes.%masked.a=select<4xi1>%mask,<4xfloat>%a,<4xfloat><float-0.0,float-0.0,float-0.0,float-0.0>%also.r=callfloat@llvm.vector.reduce.fadd.v4f32(float%start,<4xfloat>%masked.a)

llvm.vp.reduce.mul.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declarei32@llvm.vp.reduce.mul.v4i32(i32<start_value>,<4xi32><val>,<4xi1><mask>,i32<vector_length>)declarei16@llvm.vp.reduce.mul.nxv8i16(i16<start_value>,<vscalex8xi16><val>,<vscalex8xi1><mask>,i32<vector_length>)
Overview:

Predicated integerMUL reduction of a vector and a scalar starting value,returning the result as a scalar.

Arguments:

The first argument is the start value of the reduction, which must be a scalarinteger type equal to the result type. The second argument is the vector onwhich the reduction is performed and must be a vector of integer values whoseelement type is the result/start type. The third argument is the vector mask andis a vector of boolean values with the same number of elements as the vectorargument. The fourth argument is the explicit vector length of the operation.

Semantics:

The ‘llvm.vp.reduce.mul’ intrinsic performs the integerMUL reduction(llvm.vector.reduce.mul) of the vector argumentvalon each enabled lane, multiplying it by the scalarstart_value. Disabledlanes are treated as containing the neutral value1 (i.e. having no effecton the reduction operation). If the vector length is zero, the result is thestart value.

To ignore the start value, the neutral value can be used.

Examples:
%r=calli32@llvm.vp.reduce.mul.v4i32(i32%start,<4xi32>%a,<4xi1>%mask,i32%evl); %r is equivalent to %also.r, where lanes greater than or equal to %evl; are treated as though %mask were false for those lanes.%masked.a=select<4xi1>%mask,<4xi32>%a,<4xi32><i321,i321,i321,i321>%reduction=calli32@llvm.vector.reduce.mul.v4i32(<4xi32>%masked.a)%also.r=muli32%reduction,%start

llvm.vp.reduce.fmul.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declarefloat@llvm.vp.reduce.fmul.v4f32(float<start_value>,<4xfloat><val>,<4xi1><mask>,i32<vector_length>)declaredouble@llvm.vp.reduce.fmul.nxv8f64(double<start_value>,<vscalex8xdouble><val>,<vscalex8xi1><mask>,i32<vector_length>)
Overview:

Predicated floating-pointMUL reduction of a vector and a scalar startingvalue, returning the result as a scalar.

Arguments:

The first argument is the start value of the reduction, which must be a scalarfloating-point type equal to the result type. The second argument is the vectoron which the reduction is performed and must be a vector of floating-pointvalues whose element type is the result/start type. The third argument is thevector mask and is a vector of boolean values with the same number of elementsas the vector argument. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.reduce.fmul’ intrinsic performs the floating-pointMULreduction (llvm.vector.reduce.fmul) of thevector argumentval on each enabled lane, multiplying it by the scalarstart_value`. Disabled lanes are treated as containing the neutral value1.0 (i.e. having no effect on the reduction operation). If no lanes areenabled, the resulting value will be equal to the starting value.

To ignore the start value, the neutral value can be used.

See the unpredicated version (llvm.vector.reduce.fmul) for more detail on the semantics.

Examples:
%r=callfloat@llvm.vp.reduce.fmul.v4f32(float%start,<4xfloat>%a,<4xi1>%mask,i32%evl); %r is equivalent to %also.r, where lanes greater than or equal to %evl; are treated as though %mask were false for those lanes.%masked.a=select<4xi1>%mask,<4xfloat>%a,<4xfloat><float1.0,float1.0,float1.0,float1.0>%also.r=callfloat@llvm.vector.reduce.fmul.v4f32(float%start,<4xfloat>%masked.a)

llvm.vp.reduce.and.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declarei32@llvm.vp.reduce.and.v4i32(i32<start_value>,<4xi32><val>,<4xi1><mask>,i32<vector_length>)declarei16@llvm.vp.reduce.and.nxv8i16(i16<start_value>,<vscalex8xi16><val>,<vscalex8xi1><mask>,i32<vector_length>)
Overview:

Predicated integerAND reduction of a vector and a scalar starting value,returning the result as a scalar.

Arguments:

The first argument is the start value of the reduction, which must be a scalarinteger type equal to the result type. The second argument is the vector onwhich the reduction is performed and must be a vector of integer values whoseelement type is the result/start type. The third argument is the vector mask andis a vector of boolean values with the same number of elements as the vectorargument. The fourth argument is the explicit vector length of the operation.

Semantics:

The ‘llvm.vp.reduce.and’ intrinsic performs the integerAND reduction(llvm.vector.reduce.and) of the vector argumentval on each enabled lane, performing an ‘and’ of that with with thescalarstart_value. Disabled lanes are treated as containing the neutralvalueUINT_MAX, or-1 (i.e. having no effect on the reductionoperation). If the vector length is zero, the result is the start value.

To ignore the start value, the neutral value can be used.

Examples:
%r=calli32@llvm.vp.reduce.and.v4i32(i32%start,<4xi32>%a,<4xi1>%mask,i32%evl); %r is equivalent to %also.r, where lanes greater than or equal to %evl; are treated as though %mask were false for those lanes.%masked.a=select<4xi1>%mask,<4xi32>%a,<4xi32><i32-1,i32-1,i32-1,i32-1>%reduction=calli32@llvm.vector.reduce.and.v4i32(<4xi32>%masked.a)%also.r=andi32%reduction,%start

llvm.vp.reduce.or.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declarei32@llvm.vp.reduce.or.v4i32(i32<start_value>,<4xi32><val>,<4xi1><mask>,i32<vector_length>)declarei16@llvm.vp.reduce.or.nxv8i16(i16<start_value>,<vscalex8xi16><val>,<vscalex8xi1><mask>,i32<vector_length>)
Overview:

Predicated integerOR reduction of a vector and a scalar starting value,returning the result as a scalar.

Arguments:

The first argument is the start value of the reduction, which must be a scalarinteger type equal to the result type. The second argument is the vector onwhich the reduction is performed and must be a vector of integer values whoseelement type is the result/start type. The third argument is the vector mask andis a vector of boolean values with the same number of elements as the vectorargument. The fourth argument is the explicit vector length of the operation.

Semantics:

The ‘llvm.vp.reduce.or’ intrinsic performs the integerOR reduction(llvm.vector.reduce.or) of the vector argumentval on each enabled lane, performing an ‘or’ of that with the scalarstart_value. Disabled lanes are treated as containing the neutral value0 (i.e. having no effect on the reduction operation). If the vector lengthis zero, the result is the start value.

To ignore the start value, the neutral value can be used.

Examples:
%r=calli32@llvm.vp.reduce.or.v4i32(i32%start,<4xi32>%a,<4xi1>%mask,i32%evl); %r is equivalent to %also.r, where lanes greater than or equal to %evl; are treated as though %mask were false for those lanes.%masked.a=select<4xi1>%mask,<4xi32>%a,<4xi32><i320,i320,i320,i320>%reduction=calli32@llvm.vector.reduce.or.v4i32(<4xi32>%masked.a)%also.r=ori32%reduction,%start

llvm.vp.reduce.xor.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declarei32@llvm.vp.reduce.xor.v4i32(i32<start_value>,<4xi32><val>,<4xi1><mask>,i32<vector_length>)declarei16@llvm.vp.reduce.xor.nxv8i16(i16<start_value>,<vscalex8xi16><val>,<vscalex8xi1><mask>,i32<vector_length>)
Overview:

Predicated integerXOR reduction of a vector and a scalar starting value,returning the result as a scalar.

Arguments:

The first argument is the start value of the reduction, which must be a scalarinteger type equal to the result type. The second argument is the vector onwhich the reduction is performed and must be a vector of integer values whoseelement type is the result/start type. The third argument is the vector mask andis a vector of boolean values with the same number of elements as the vectorargument. The fourth argument is the explicit vector length of the operation.

Semantics:

The ‘llvm.vp.reduce.xor’ intrinsic performs the integerXOR reduction(llvm.vector.reduce.xor) of the vector argumentval on each enabled lane, performing an ‘xor’ of that with the scalarstart_value. Disabled lanes are treated as containing the neutral value0 (i.e. having no effect on the reduction operation). If the vector lengthis zero, the result is the start value.

To ignore the start value, the neutral value can be used.

Examples:
%r=calli32@llvm.vp.reduce.xor.v4i32(i32%start,<4xi32>%a,<4xi1>%mask,i32%evl); %r is equivalent to %also.r, where lanes greater than or equal to %evl; are treated as though %mask were false for those lanes.%masked.a=select<4xi1>%mask,<4xi32>%a,<4xi32><i320,i320,i320,i320>%reduction=calli32@llvm.vector.reduce.xor.v4i32(<4xi32>%masked.a)%also.r=xori32%reduction,%start

llvm.vp.reduce.smax.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declarei32@llvm.vp.reduce.smax.v4i32(i32<start_value>,<4xi32><val>,<4xi1><mask>,i32<vector_length>)declarei16@llvm.vp.reduce.smax.nxv8i16(i16<start_value>,<vscalex8xi16><val>,<vscalex8xi1><mask>,i32<vector_length>)
Overview:

Predicated signed-integerMAX reduction of a vector and a scalar startingvalue, returning the result as a scalar.

Arguments:

The first argument is the start value of the reduction, which must be a scalarinteger type equal to the result type. The second argument is the vector onwhich the reduction is performed and must be a vector of integer values whoseelement type is the result/start type. The third argument is the vector mask andis a vector of boolean values with the same number of elements as the vectorargument. The fourth argument is the explicit vector length of the operation.

Semantics:

The ‘llvm.vp.reduce.smax’ intrinsic performs the signed-integerMAXreduction (llvm.vector.reduce.smax) of thevector argumentval on each enabled lane, and taking the maximum of that andthe scalarstart_value. Disabled lanes are treated as containing theneutral valueINT_MIN (i.e. having no effect on the reduction operation).If the vector length is zero, the result is the start value.

To ignore the start value, the neutral value can be used.

Examples:
%r=calli8@llvm.vp.reduce.smax.v4i8(i8%start,<4xi8>%a,<4xi1>%mask,i32%evl); %r is equivalent to %also.r, where lanes greater than or equal to %evl; are treated as though %mask were false for those lanes.%masked.a=select<4xi1>%mask,<4xi8>%a,<4xi8><i8-128,i8-128,i8-128,i8-128>%reduction=calli8@llvm.vector.reduce.smax.v4i8(<4xi8>%masked.a)%also.r=calli8@llvm.smax.i8(i8%reduction,i8%start)

llvm.vp.reduce.smin.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declarei32@llvm.vp.reduce.smin.v4i32(i32<start_value>,<4xi32><val>,<4xi1><mask>,i32<vector_length>)declarei16@llvm.vp.reduce.smin.nxv8i16(i16<start_value>,<vscalex8xi16><val>,<vscalex8xi1><mask>,i32<vector_length>)
Overview:

Predicated signed-integerMIN reduction of a vector and a scalar startingvalue, returning the result as a scalar.

Arguments:

The first argument is the start value of the reduction, which must be a scalarinteger type equal to the result type. The second argument is the vector onwhich the reduction is performed and must be a vector of integer values whoseelement type is the result/start type. The third argument is the vector mask andis a vector of boolean values with the same number of elements as the vectorargument. The fourth argument is the explicit vector length of the operation.

Semantics:

The ‘llvm.vp.reduce.smin’ intrinsic performs the signed-integerMINreduction (llvm.vector.reduce.smin) of thevector argumentval on each enabled lane, and taking the minimum of that andthe scalarstart_value. Disabled lanes are treated as containing theneutral valueINT_MAX (i.e. having no effect on the reduction operation).If the vector length is zero, the result is the start value.

To ignore the start value, the neutral value can be used.

Examples:
%r=calli8@llvm.vp.reduce.smin.v4i8(i8%start,<4xi8>%a,<4xi1>%mask,i32%evl); %r is equivalent to %also.r, where lanes greater than or equal to %evl; are treated as though %mask were false for those lanes.%masked.a=select<4xi1>%mask,<4xi8>%a,<4xi8><i8127,i8127,i8127,i8127>%reduction=calli8@llvm.vector.reduce.smin.v4i8(<4xi8>%masked.a)%also.r=calli8@llvm.smin.i8(i8%reduction,i8%start)

llvm.vp.reduce.umax.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declarei32@llvm.vp.reduce.umax.v4i32(i32<start_value>,<4xi32><val>,<4xi1><mask>,i32<vector_length>)declarei16@llvm.vp.reduce.umax.nxv8i16(i16<start_value>,<vscalex8xi16><val>,<vscalex8xi1><mask>,i32<vector_length>)
Overview:

Predicated unsigned-integerMAX reduction of a vector and a scalar startingvalue, returning the result as a scalar.

Arguments:

The first argument is the start value of the reduction, which must be a scalarinteger type equal to the result type. The second argument is the vector onwhich the reduction is performed and must be a vector of integer values whoseelement type is the result/start type. The third argument is the vector mask andis a vector of boolean values with the same number of elements as the vectorargument. The fourth argument is the explicit vector length of the operation.

Semantics:

The ‘llvm.vp.reduce.umax’ intrinsic performs the unsigned-integerMAXreduction (llvm.vector.reduce.umax) of thevector argumentval on each enabled lane, and taking the maximum of that andthe scalarstart_value. Disabled lanes are treated as containing theneutral value0 (i.e. having no effect on the reduction operation). If thevector length is zero, the result is the start value.

To ignore the start value, the neutral value can be used.

Examples:
%r=calli32@llvm.vp.reduce.umax.v4i32(i32%start,<4xi32>%a,<4xi1>%mask,i32%evl); %r is equivalent to %also.r, where lanes greater than or equal to %evl; are treated as though %mask were false for those lanes.%masked.a=select<4xi1>%mask,<4xi32>%a,<4xi32><i320,i320,i320,i320>%reduction=calli32@llvm.vector.reduce.umax.v4i32(<4xi32>%masked.a)%also.r=calli32@llvm.umax.i32(i32%reduction,i32%start)

llvm.vp.reduce.umin.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declarei32@llvm.vp.reduce.umin.v4i32(i32<start_value>,<4xi32><val>,<4xi1><mask>,i32<vector_length>)declarei16@llvm.vp.reduce.umin.nxv8i16(i16<start_value>,<vscalex8xi16><val>,<vscalex8xi1><mask>,i32<vector_length>)
Overview:

Predicated unsigned-integerMIN reduction of a vector and a scalar startingvalue, returning the result as a scalar.

Arguments:

The first argument is the start value of the reduction, which must be a scalarinteger type equal to the result type. The second argument is the vector onwhich the reduction is performed and must be a vector of integer values whoseelement type is the result/start type. The third argument is the vector mask andis a vector of boolean values with the same number of elements as the vectorargument. The fourth argument is the explicit vector length of the operation.

Semantics:

The ‘llvm.vp.reduce.umin’ intrinsic performs the unsigned-integerMINreduction (llvm.vector.reduce.umin) of thevector argumentval on each enabled lane, taking the minimum of that and thescalarstart_value. Disabled lanes are treated as containing the neutralvalueUINT_MAX, or-1 (i.e. having no effect on the reductionoperation). If the vector length is zero, the result is the start value.

To ignore the start value, the neutral value can be used.

Examples:
%r=calli32@llvm.vp.reduce.umin.v4i32(i32%start,<4xi32>%a,<4xi1>%mask,i32%evl); %r is equivalent to %also.r, where lanes greater than or equal to %evl; are treated as though %mask were false for those lanes.%masked.a=select<4xi1>%mask,<4xi32>%a,<4xi32><i32-1,i32-1,i32-1,i32-1>%reduction=calli32@llvm.vector.reduce.umin.v4i32(<4xi32>%masked.a)%also.r=calli32@llvm.umin.i32(i32%reduction,i32%start)

llvm.vp.reduce.fmax.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declarefloat@llvm.vp.reduce.fmax.v4f32(float<start_value>,<4xfloat><val>,<4xi1><mask>,i32<vector_length>)declaredouble@llvm.vp.reduce.fmax.nxv8f64(double<start_value>,<vscalex8xdouble><val>,<vscalex8xi1><mask>,i32<vector_length>)
Overview:

Predicated floating-pointMAX reduction of a vector and a scalar startingvalue, returning the result as a scalar.

Arguments:

The first argument is the start value of the reduction, which must be a scalarfloating-point type equal to the result type. The second argument is the vectoron which the reduction is performed and must be a vector of floating-pointvalues whose element type is the result/start type. The third argument is thevector mask and is a vector of boolean values with the same number of elementsas the vector argument. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.reduce.fmax’ intrinsic performs the floating-pointMAXreduction (llvm.vector.reduce.fmax) of thevector argumentval on each enabled lane, taking the maximum of that and thescalarstart_value. Disabled lanes are treated as containing the neutralvalue (i.e. having no effect on the reduction operation). If the vector lengthis zero, the result is the start value.

The neutral value is dependent on thefast-math flags. If noflags are set, the neutral value is-QNAN. Ifnnan andninf areboth set, then the neutral value is the smallest floating-point value for theresult type. If onlynnan is set then the neutral value is-Infinity.

This instruction has the same comparison semantics as thellvm.vector.reduce.fmax intrinsic (and thus the‘llvm.maxnum.*’ intrinsic).

To ignore the start value, the neutral value can be used.

Examples:
%r=callfloat@llvm.vp.reduce.fmax.v4f32(float%float,<4xfloat>%a,<4xi1>%mask,i32%evl); %r is equivalent to %also.r, where lanes greater than or equal to %evl; are treated as though %mask were false for those lanes.%masked.a=select<4xi1>%mask,<4xfloat>%a,<4xfloat><floatQNAN,floatQNAN,floatQNAN,floatQNAN>%reduction=callfloat@llvm.vector.reduce.fmax.v4f32(<4xfloat>%masked.a)%also.r=callfloat@llvm.maxnum.f32(float%reduction,float%start)

llvm.vp.reduce.fmin.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declarefloat@llvm.vp.reduce.fmin.v4f32(float<start_value>,<4xfloat><val>,<4xi1><mask>,i32<vector_length>)declaredouble@llvm.vp.reduce.fmin.nxv8f64(double<start_value>,<vscalex8xdouble><val>,<vscalex8xi1><mask>,i32<vector_length>)
Overview:

Predicated floating-pointMIN reduction of a vector and a scalar startingvalue, returning the result as a scalar.

Arguments:

The first argument is the start value of the reduction, which must be a scalarfloating-point type equal to the result type. The second argument is the vectoron which the reduction is performed and must be a vector of floating-pointvalues whose element type is the result/start type. The third argument is thevector mask and is a vector of boolean values with the same number of elementsas the vector argument. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.reduce.fmin’ intrinsic performs the floating-pointMINreduction (llvm.vector.reduce.fmin) of thevector argumentval on each enabled lane, taking the minimum of that and thescalarstart_value. Disabled lanes are treated as containing the neutralvalue (i.e. having no effect on the reduction operation). If the vector lengthis zero, the result is the start value.

The neutral value is dependent on thefast-math flags. If noflags are set, the neutral value is+QNAN. Ifnnan andninf areboth set, then the neutral value is the largest floating-point value for theresult type. If onlynnan is set then the neutral value is+Infinity.

This instruction has the same comparison semantics as thellvm.vector.reduce.fmin intrinsic (and thus the‘llvm.minnum.*’ intrinsic).

To ignore the start value, the neutral value can be used.

Examples:
%r=callfloat@llvm.vp.reduce.fmin.v4f32(float%start,<4xfloat>%a,<4xi1>%mask,i32%evl); %r is equivalent to %also.r, where lanes greater than or equal to %evl; are treated as though %mask were false for those lanes.%masked.a=select<4xi1>%mask,<4xfloat>%a,<4xfloat><floatQNAN,floatQNAN,floatQNAN,floatQNAN>%reduction=callfloat@llvm.vector.reduce.fmin.v4f32(<4xfloat>%masked.a)%also.r=callfloat@llvm.minnum.f32(float%reduction,float%start)

llvm.vp.reduce.fmaximum.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declarefloat@llvm.vp.reduce.fmaximum.v4f32(float<start_value>,<4xfloat><val>,<4xi1><mask>,i32<vector_length>)declaredouble@llvm.vp.reduce.fmaximum.nxv8f64(double<start_value>,<vscalex8xdouble><val>,<vscalex8xi1><mask>,i32<vector_length>)
Overview:

Predicated floating-pointMAX reduction of a vector and a scalar startingvalue, returning the result as a scalar.

Arguments:

The first argument is the start value of the reduction, which must be a scalarfloating-point type equal to the result type. The second argument is the vectoron which the reduction is performed and must be a vector of floating-pointvalues whose element type is the result/start type. The third argument is thevector mask and is a vector of boolean values with the same number of elementsas the vector argument. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.reduce.fmaximum’ intrinsic performs the floating-pointMAXreduction (llvm.vector.reduce.fmaximum) ofthe vector argumentval on each enabled lane, taking the maximum of that andthe scalarstart_value. Disabled lanes are treated as containing theneutral value (i.e. having no effect on the reduction operation). If the vectorlength is zero, the result is the start value.

The neutral value is dependent on thefast-math flags. If noflags are set or only thennan is set, the neutral value is-Infinity.Ifninf is set, then the neutral value is the smallest floating-point valuefor the result type.

This instruction has the same comparison semantics as thellvm.vector.reduce.fmaximum intrinsic (andthus the ‘llvm.maximum.*’ intrinsic). That is, the result will always be anumber unless any of the elements in the vector or the starting value isNaN. Namely, this intrinsic propagatesNaN. Also, -0.0 is consideredless than +0.0.

To ignore the start value, the neutral value can be used.

Examples:
%r=callfloat@llvm.vp.reduce.fmaximum.v4f32(float%float,<4xfloat>%a,<4xi1>%mask,i32%evl); %r is equivalent to %also.r, where lanes greater than or equal to %evl; are treated as though %mask were false for those lanes.%masked.a=select<4xi1>%mask,<4xfloat>%a,<4xfloat><float-infinity,float-infinity,float-infinity,float-infinity>%reduction=callfloat@llvm.vector.reduce.fmaximum.v4f32(<4xfloat>%masked.a)%also.r=callfloat@llvm.maximum.f32(float%reduction,float%start)

llvm.vp.reduce.fminimum.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declarefloat@llvm.vp.reduce.fminimum.v4f32(float<start_value>,<4xfloat><val>,<4xi1><mask>,i32<vector_length>)declaredouble@llvm.vp.reduce.fminimum.nxv8f64(double<start_value>,<vscalex8xdouble><val>,<vscalex8xi1><mask>,i32<vector_length>)
Overview:

Predicated floating-pointMIN reduction of a vector and a scalar startingvalue, returning the result as a scalar.

Arguments:

The first argument is the start value of the reduction, which must be a scalarfloating-point type equal to the result type. The second argument is the vectoron which the reduction is performed and must be a vector of floating-pointvalues whose element type is the result/start type. The third argument is thevector mask and is a vector of boolean values with the same number of elementsas the vector argument. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.reduce.fminimum’ intrinsic performs the floating-pointMINreduction (llvm.vector.reduce.fminimum) ofthe vector argumentval on each enabled lane, taking the minimum of that andthe scalarstart_value. Disabled lanes are treated as containing the neutralvalue (i.e. having no effect on the reduction operation). If the vector lengthis zero, the result is the start value.

The neutral value is dependent on thefast-math flags. If noflags are set or only thennan is set, the neutral value is+Infinity.Ifninf is set, then the neutral value is the largest floating-point valuefor the result type.

This instruction has the same comparison semantics as thellvm.vector.reduce.fminimum intrinsic (andthus the ‘llvm.minimum.*’ intrinsic). That is, the result will always be anumber unless any of the elements in the vector or the starting value isNaN. Namely, this intrinsic propagatesNaN. Also, -0.0 is consideredless than +0.0.

To ignore the start value, the neutral value can be used.

Examples:
%r=callfloat@llvm.vp.reduce.fminimum.v4f32(float%start,<4xfloat>%a,<4xi1>%mask,i32%evl); %r is equivalent to %also.r, where lanes greater than or equal to %evl; are treated as though %mask were false for those lanes.%masked.a=select<4xi1>%mask,<4xfloat>%a,<4xfloat><floatinfinity,floatinfinity,floatinfinity,floatinfinity>%reduction=callfloat@llvm.vector.reduce.fminimum.v4f32(<4xfloat>%masked.a)%also.r=callfloat@llvm.minimum.f32(float%reduction,float%start)

llvm.get.active.lane.mask.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<4xi1>@llvm.get.active.lane.mask.v4i1.i32(i32%base,i32%n)declare<8xi1>@llvm.get.active.lane.mask.v8i1.i64(i64%base,i64%n)declare<16xi1>@llvm.get.active.lane.mask.v16i1.i64(i64%base,i64%n)declare<vscalex16xi1>@llvm.get.active.lane.mask.nxv16i1.i64(i64%base,i64%n)
Overview:

Create a mask representing active and inactive vector lanes.

Arguments:

Both arguments have the same scalar integer type. The result is a vector withthe i1 element type.

Semantics:

The ‘llvm.get.active.lane.mask.*’ intrinsics are semantically equivalentto:

%m[i]=icmpult(%base+i),%n

where%m is a vector (mask) of active/inactive lanes with its elementsindexed byi, and%base,%n are the two arguments tollvm.get.active.lane.mask.*,%icmp is an integer compare andultthe unsigned less-than comparison operator. Overflow cannot occur in(%base+i) and its comparison against%n as it is performed in integernumbers and not in machine numbers. If%n is0, then the result is apoison value. The above is equivalent to:

%m=@llvm.get.active.lane.mask(%base,%n)

This can, for example, be emitted by the loop vectorizer in which case%base is the first element of the vector induction variable (VIV) and%n is the loop tripcount. Thus, these intrinsics perform an element-wiseless than comparison of VIV with the loop tripcount, producing a mask oftrue/false values representing active/inactive vector lanes, except if the VIVoverflows in which case they return false in the lanes where the VIV overflows.The arguments are scalar types to accommodate scalable vector types, for whichit is unknown what the type of the step vector needs to be that enumerate itslanes without overflow.

This mask%m can e.g. be used in masked load/store instructions. Theseintrinsics provide a hint to the backend. I.e., for a vector loop, theback-edge taken count of the original scalar loop is explicit as the secondargument.

Examples:
%active.lane.mask=call<4xi1>@llvm.get.active.lane.mask.v4i1.i64(i64%elem0,i64429)%wide.masked.load=call<4xi32>@llvm.masked.load.v4i32.p0v4i32(<4xi32>*%3,i324,<4xi1>%active.lane.mask,<4xi32>poison)

llvm.experimental.vp.splice’ Intrinsic

Syntax:

This is an overloaded intrinsic.

declare<2xdouble>@llvm.experimental.vp.splice.v2f64(<2xdouble>%vec1,<2xdouble>%vec2,i32%imm,<2xi1>%mask,i32%evl1,i32%evl2)declare<vscalex4xi32>@llvm.experimental.vp.splice.nxv4i32(<vscalex4xi32>%vec1,<vscalex4xi32>%vec2,i32%imm,<vscalex4xi1>%mask,i32%evl1,i32%evl2)
Overview:

The ‘llvm.experimental.vp.splice.*’ intrinsic is the vector lengthpredicated version of the ‘llvm.vector.splice.*’ intrinsic.

Arguments:

The result and the first two argumentsvec1 andvec2 are vectors withthe same type. The third argumentimm is an immediate signed integer thatindicates the offset index. The fourth argumentmask is a vector mask andhas the same number of elements as the result. The last two argumentsevl1andevl2 are unsigned integers indicating the explicit vector lengths ofvec1 andvec2 respectively.imm,evl1 andevl2 shouldrespect the following constraints:-evl1<=imm<evl1,0<=evl1<=VLand0<=evl2<=VL, whereVL is the runtime vector factor. If theseconstraints are not satisfied the intrinsic has undefined behavior.

Semantics:

Effectively, this intrinsic concatenatesvec1[0..evl1-1] andvec2[0..evl2-1] and creates the result vector by selecting the elements in awindow of sizeevl2, starting at indeximm (for a positive immediate) ofthe concatenated vector. Elements in the result vector beyondevl2 areundef. Ifimm is negative the starting index isevl1+imm. The resultvector of active vector lengthevl2 containsevl1-imm (-imm fornegativeimm) elements from indices[imm..evl1-1]([evl1+imm..evl1-1] for negativeimm) ofvec1 followed by thefirstevl2-(evl1-imm) (evl2+imm for negativeimm) elements ofvec2. Ifevl1-imm (-imm) >=evl2, only the firstevl2elements are considered and the remaining areundef. The lanes in the resultvector disabled bymask arepoison.

Examples:
llvm.experimental.vp.splice(<A,B,C,D>, <E,F,G,H>, 1, 2, 3);  ==> <B, E, F, poison> indexllvm.experimental.vp.splice(<A,B,C,D>, <E,F,G,H>, -2, 3, 2); ==> <B, C, poison, poison> trailing elements

llvm.experimental.vp.splat’ Intrinsic

Syntax:

This is an overloaded intrinsic.

declare<2xdouble>@llvm.experimental.vp.splat.v2f64(double%scalar,<2xi1>%mask,i32%evl)declare<vscalex4xi32>@llvm.experimental.vp.splat.nxv4i32(i32%scalar,<vscalex4xi1>%mask,i32%evl)
Overview:

The ‘llvm.experimental.vp.splat.*’ intrinsic is to create a predicated splatwith specific effective vector length.

Arguments:

The result is a vector and it is a splat of the first scalar argument. Thesecond argumentmask is a vector mask and has the same number of elements asthe result. The third argument is the explicit vector length of the operation.

Semantics:

This intrinsic splats a vector withevl elements of a scalar argument.The lanes in the result vector disabled bymask arepoison. Theelements pastevl are poison.

Examples:
%r=call<4xfloat>@llvm.vp.splat.v4f32(float%a,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%e=insertelement<4xfloat>poison,float%a,i320%s=shufflevector<4xfloat>%e,<4xfloat>poison,<4xi32>zeroinitializer%also.r=select<4xi1>%mask,<4xfloat>%s,<4xfloat>poison

llvm.experimental.vp.reverse’ Intrinsic

Syntax:

This is an overloaded intrinsic.

declare<2xdouble>@llvm.experimental.vp.reverse.v2f64(<2xdouble>%vec,<2xi1>%mask,i32%evl)declare<vscalex4xi32>@llvm.experimental.vp.reverse.nxv4i32(<vscalex4xi32>%vec,<vscalex4xi1>%mask,i32%evl)
Overview:

The ‘llvm.experimental.vp.reverse.*’ intrinsic is the vector lengthpredicated version of the ‘llvm.vector.reverse.*’ intrinsic.

Arguments:

The result and the first argumentvec are vectors with the same type.The second argumentmask is a vector mask and has the same number ofelements as the result. The third argument is the explicit vector length ofthe operation.

Semantics:

This intrinsic reverses the order of the firstevl elements in a vector.The lanes in the result vector disabled bymask arepoison. Theelements pastevl are poison.

llvm.vp.load’ Intrinsic

Syntax:

This is an overloaded intrinsic.

declare<4xfloat>@llvm.vp.load.v4f32.p0(ptr%ptr,<4xi1>%mask,i32%evl)declare<vscalex2xi16>@llvm.vp.load.nxv2i16.p0(ptr%ptr,<vscalex2xi1>%mask,i32%evl)declare<8xfloat>@llvm.vp.load.v8f32.p1(ptraddrspace(1)%ptr,<8xi1>%mask,i32%evl)declare<vscalex1xi64>@llvm.vp.load.nxv1i64.p6(ptraddrspace(6)%ptr,<vscalex1xi1>%mask,i32%evl)
Overview:

The ‘llvm.vp.load.*’ intrinsic is the vector length predicated version ofthellvm.masked.load intrinsic.

Arguments:

The first argument is the base pointer for the load. The second argument is avector of boolean values with the same number of elements as the return type.The third is the explicit vector length of the operation. The return type andunderlying type of the base pointer are the same vector types.

Thealign parameter attribute can be provided for the firstargument.

Semantics:

The ‘llvm.vp.load’ intrinsic reads a vector from memory in the same way asthe ‘llvm.masked.load’ intrinsic, where the mask is taken from thecombination of the ‘mask’ and ‘evl’ arguments in the usual VP way.Certain ‘llvm.masked.load’ arguments do not have corresponding arguments in‘llvm.vp.load’: the ‘passthru’ argument is implicitlypoison; the‘alignment’ argument is taken as thealign parameter attribute, ifprovided. The default alignment is taken as the ABI alignment of the returntype as specified by thedatalayout string.

Examples:
%r = call <8 x i8> @llvm.vp.load.v8i8.p0(ptr align 2 %ptr, <8 x i1> %mask, i32 %evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%also.r = call <8 x i8> @llvm.masked.load.v8i8.p0(ptr %ptr, i32 2, <8 x i1> %mask, <8 x i8> poison)

llvm.vp.store’ Intrinsic

Syntax:

This is an overloaded intrinsic.

declarevoid@llvm.vp.store.v4f32.p0(<4xfloat>%val,ptr%ptr,<4xi1>%mask,i32%evl)declarevoid@llvm.vp.store.nxv2i16.p0(<vscalex2xi16>%val,ptr%ptr,<vscalex2xi1>%mask,i32%evl)declarevoid@llvm.vp.store.v8f32.p1(<8xfloat>%val,ptraddrspace(1)%ptr,<8xi1>%mask,i32%evl)declarevoid@llvm.vp.store.nxv1i64.p6(<vscalex1xi64>%val,ptraddrspace(6)%ptr,<vscalex1xi1>%mask,i32%evl)
Overview:

The ‘llvm.vp.store.*’ intrinsic is the vector length predicated version ofthellvm.masked.store intrinsic.

Arguments:

The first argument is the vector value to be written to memory. The secondargument is the base pointer for the store. It has the same underlying type asthe value argument. The third argument is a vector of boolean values with thesame number of elements as the return type. The fourth is the explicit vectorlength of the operation.

Thealign parameter attribute can be provided for thesecond argument.

Semantics:

The ‘llvm.vp.store’ intrinsic reads a vector from memory in the same way asthe ‘llvm.masked.store’ intrinsic, where the mask is taken from thecombination of the ‘mask’ and ‘evl’ arguments in the usual VP way. Thealignment of the operation (corresponding to the ‘alignment’ argument of‘llvm.masked.store’) is specified by thealign parameter attribute (seeabove). If it is not provided then the ABI alignment of the type of the‘value’ argument as specified by thedatalayoutstring is used instead.

Examples:
call void @llvm.vp.store.v8i8.p0(<8 x i8> %val, ptr align 4 %ptr, <8 x i1> %mask, i32 %evl);; For all lanes below %evl, the call above is lane-wise equivalent to the call below.call void @llvm.masked.store.v8i8.p0(<8 x i8> %val, ptr %ptr, i32 4, <8 x i1> %mask)

llvm.experimental.vp.strided.load’ Intrinsic

Syntax:

This is an overloaded intrinsic.

declare<4xfloat>@llvm.experimental.vp.strided.load.v4f32.i64(ptr%ptr,i64%stride,<4xi1>%mask,i32%evl)declare<vscalex2xi16>@llvm.experimental.vp.strided.load.nxv2i16.i64(ptr%ptr,i64%stride,<vscalex2xi1>%mask,i32%evl)
Overview:

The ‘llvm.experimental.vp.strided.load’ intrinsic loads, into a vector, scalar values frommemory locations evenly spaced apart by ‘stride’ number of bytes, starting from ‘ptr’.

Arguments:

The first argument is the base pointer for the load. The second argument is the stridevalue expressed in bytes. The third argument is a vector of boolean valueswith the same number of elements as the return type. The fourth is the explicitvector length of the operation. The base pointer underlying type matches the type of the scalarelements of the return argument.

Thealign parameter attribute can be provided for the firstargument.

Semantics:

The ‘llvm.experimental.vp.strided.load’ intrinsic loads, into a vector, multiple scalarvalues from memory in the same way as thellvm.vp.gather intrinsic,where the vector of pointers is in the form:

%ptrs=<%ptr,%ptr+%stride,%ptr+2*%stride,...>,

with ‘ptr’ previously casted to a pointer ‘i8’, ‘stride’ always interpreted as a signedinteger and all arithmetic occurring in the pointer type.

Examples:
%r = call <8 x i64> @llvm.experimental.vp.strided.load.v8i64.i64(i64* %ptr, i64 %stride, <8 x i64> %mask, i32 %evl);; The operation can also be expressed like this:%addr = bitcast i64* %ptr to i8*;; Create a vector of pointers %addrs in the form:;; %addrs = <%addr, %addr + %stride, %addr + 2 * %stride, ...>%ptrs = bitcast <8 x i8* > %addrs to <8 x i64* >%also.r = call <8 x i64> @llvm.vp.gather.v8i64.v8p0i64(<8 x i64* > %ptrs, <8 x i64> %mask, i32 %evl)

llvm.experimental.vp.strided.store’ Intrinsic

Syntax:

This is an overloaded intrinsic.

declarevoid@llvm.experimental.vp.strided.store.v4f32.i64(<4xfloat>%val,ptr%ptr,i64%stride,<4xi1>%mask,i32%evl)declarevoid@llvm.experimental.vp.strided.store.nxv2i16.i64(<vscalex2xi16>%val,ptr%ptr,i64%stride,<vscalex2xi1>%mask,i32%evl)
Overview:

The ‘@llvm.experimental.vp.strided.store’ intrinsic stores the elements of‘val’ into memory locations evenly spaced apart by ‘stride’ number ofbytes, starting from ‘ptr’.

Arguments:

The first argument is the vector value to be written to memory. The secondargument is the base pointer for the store. Its underlying type matches thescalar element type of the value argument. The third argument is the stride valueexpressed in bytes. The fourth argument is a vector of boolean values with thesame number of elements as the return type. The fifth is the explicit vectorlength of the operation.

Thealign parameter attribute can be provided for thesecond argument.

Semantics:

The ‘llvm.experimental.vp.strided.store’ intrinsic stores the elements of‘val’ in the same way as thellvm.vp.scatter intrinsic,where the vector of pointers is in the form:

%ptrs=<%ptr,%ptr+%stride,%ptr+2*%stride,...>,

with ‘ptr’ previously casted to a pointer ‘i8’, ‘stride’ always interpreted as a signedinteger and all arithmetic occurring in the pointer type.

Examples:
call void @llvm.experimental.vp.strided.store.v8i64.i64(<8 x i64> %val, i64* %ptr, i64 %stride, <8 x i1> %mask, i32 %evl);; The operation can also be expressed like this:%addr = bitcast i64* %ptr to i8*;; Create a vector of pointers %addrs in the form:;; %addrs = <%addr, %addr + %stride, %addr + 2 * %stride, ...>%ptrs = bitcast <8 x i8* > %addrs to <8 x i64* >call void @llvm.vp.scatter.v8i64.v8p0i64(<8 x i64> %val, <8 x i64*> %ptrs, <8 x i1> %mask, i32 %evl)

llvm.vp.gather’ Intrinsic

Syntax:

This is an overloaded intrinsic.

declare<4xdouble>@llvm.vp.gather.v4f64.v4p0(<4xptr>%ptrs,<4xi1>%mask,i32%evl)declare<vscalex2xi8>@llvm.vp.gather.nxv2i8.nxv2p0(<vscalex2xptr>%ptrs,<vscalex2xi1>%mask,i32%evl)declare<2xfloat>@llvm.vp.gather.v2f32.v2p2(<2xptraddrspace(2)>%ptrs,<2xi1>%mask,i32%evl)declare<vscalex4xi32>@llvm.vp.gather.nxv4i32.nxv4p4(<vscalex4xptraddrspace(4)>%ptrs,<vscalex4xi1>%mask,i32%evl)
Overview:

The ‘llvm.vp.gather.*’ intrinsic is the vector length predicated version ofthellvm.masked.gather intrinsic.

Arguments:

The first argument is a vector of pointers which holds all memory addresses toread. The second argument is a vector of boolean values with the same number ofelements as the return type. The third is the explicit vector length of theoperation. The return type and underlying type of the vector of pointers arethe same vector types.

Thealign parameter attribute can be provided for the firstargument.

Semantics:

The ‘llvm.vp.gather’ intrinsic reads multiple scalar values from memory inthe same way as the ‘llvm.masked.gather’ intrinsic, where the mask is takenfrom the combination of the ‘mask’ and ‘evl’ arguments in the usual VPway. Certain ‘llvm.masked.gather’ arguments do not have correspondingarguments in ‘llvm.vp.gather’: the ‘passthru’ argument is implicitlypoison; the ‘alignment’ argument is taken as thealign parameter, ifprovided. The default alignment is taken as the ABI alignment of the sourceaddresses as specified by thedatalayout string.

Examples:
%r = call <8 x i8> @llvm.vp.gather.v8i8.v8p0(<8 x ptr>  align 8 %ptrs, <8 x i1> %mask, i32 %evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%also.r = call <8 x i8> @llvm.masked.gather.v8i8.v8p0(<8 x ptr> %ptrs, i32 8, <8 x i1> %mask, <8 x i8> poison)

llvm.vp.scatter’ Intrinsic

Syntax:

This is an overloaded intrinsic.

declarevoid@llvm.vp.scatter.v4f64.v4p0(<4xdouble>%val,<4xptr>%ptrs,<4xi1>%mask,i32%evl)declarevoid@llvm.vp.scatter.nxv2i8.nxv2p0(<vscalex2xi8>%val,<vscalex2xptr>%ptrs,<vscalex2xi1>%mask,i32%evl)declarevoid@llvm.vp.scatter.v2f32.v2p2(<2xfloat>%val,<2xptraddrspace(2)>%ptrs,<2xi1>%mask,i32%evl)declarevoid@llvm.vp.scatter.nxv4i32.nxv4p4(<vscalex4xi32>%val,<vscalex4xptraddrspace(4)>%ptrs,<vscalex4xi1>%mask,i32%evl)
Overview:

The ‘llvm.vp.scatter.*’ intrinsic is the vector length predicated version ofthellvm.masked.scatter intrinsic.

Arguments:

The first argument is a vector value to be written to memory. The second argumentis a vector of pointers, pointing to where the value elements should be stored.The third argument is a vector of boolean values with the same number ofelements as the return type. The fourth is the explicit vector length of theoperation.

Thealign parameter attribute can be provided for thesecond argument.

Semantics:

The ‘llvm.vp.scatter’ intrinsic writes multiple scalar values to memory inthe same way as the ‘llvm.masked.scatter’ intrinsic, where the mask istaken from the combination of the ‘mask’ and ‘evl’ arguments in theusual VP way. The ‘alignment’ argument of the ‘llvm.masked.scatter’ doesnot have a corresponding argument in ‘llvm.vp.scatter’: it is insteadprovided via the optionalalign parameter attribute on thevector-of-pointers argument. Otherwise it is taken as the ABI alignment of thedestination addresses as specified by thedatalayoutstring.

Examples:
call void @llvm.vp.scatter.v8i8.v8p0(<8 x i8> %val, <8 x ptr> align 1 %ptrs, <8 x i1> %mask, i32 %evl);; For all lanes below %evl, the call above is lane-wise equivalent to the call below.call void @llvm.masked.scatter.v8i8.v8p0(<8 x i8> %val, <8 x ptr> %ptrs, i32 1, <8 x i1> %mask)

llvm.vp.trunc.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi16>@llvm.vp.trunc.v16i16.v16i32(<16xi32><op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi16>@llvm.vp.trunc.nxv4i16.nxv4i32(<vscalex4xi32><op>,<vscalex4xi1><mask>,i32<vector_length>)
Overview:

The ‘llvm.vp.trunc’ intrinsic truncates its first argument to the returntype. The operation has a mask and an explicit vector length parameter.

Arguments:

The ‘llvm.vp.trunc’ intrinsic takes a value to cast as its first argument.The return type is the type to cast the value to. Both types must be vector ofinteger type. The bit size of the value must be larger thanthe bit size of the return type. The second argument is the vector mask. Thereturn type, the value to cast, and the vector mask have the same number ofelements. The third argument is the explicit vector length of the operation.

Semantics:

The ‘llvm.vp.trunc’ intrinsic truncates the high order bits in value andconverts the remaining bits to return type. Since the source size must be largerthan the destination size, ‘llvm.vp.trunc’ cannot be ano-op cast. It willalways truncate bits. The conversion is performed on lane positions below theexplicit vector length and where the vector mask is true. Masked-off lanes arepoison.

Examples:
%r=call<4xi16>@llvm.vp.trunc.v4i16.v4i32(<4xi32>%a,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=trunc<4xi32>%ato<4xi16>%also.r=select<4xi1>%mask,<4xi16>%t,<4xi16>poison

llvm.vp.zext.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.zext.v16i32.v16i16(<16xi16><op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi32>@llvm.vp.zext.nxv4i32.nxv4i16(<vscalex4xi16><op>,<vscalex4xi1><mask>,i32<vector_length>)
Overview:

The ‘llvm.vp.zext’ intrinsic zero extends its first argument to the returntype. The operation has a mask and an explicit vector length parameter.

Arguments:

The ‘llvm.vp.zext’ intrinsic takes a value to cast as its first argument.The return type is the type to cast the value to. Both types must be vectors ofinteger type. The bit size of the value must be smaller thanthe bit size of the return type. The second argument is the vector mask. Thereturn type, the value to cast, and the vector mask have the same number ofelements. The third argument is the explicit vector length of the operation.

Semantics:

The ‘llvm.vp.zext’ intrinsic fill the high order bits of the value with zerobits until it reaches the size of the return type. When zero extending from i1,the result will always be either 0 or 1. The conversion is performed on lanepositions below the explicit vector length and where the vector mask is true.Masked-off lanes arepoison.

Examples:
%r=call<4xi32>@llvm.vp.zext.v4i32.v4i16(<4xi16>%a,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=zext<4xi16>%ato<4xi32>%also.r=select<4xi1>%mask,<4xi32>%t,<4xi32>poison

llvm.vp.sext.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.sext.v16i32.v16i16(<16xi16><op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi32>@llvm.vp.sext.nxv4i32.nxv4i16(<vscalex4xi16><op>,<vscalex4xi1><mask>,i32<vector_length>)
Overview:

The ‘llvm.vp.sext’ intrinsic sign extends its first argument to the returntype. The operation has a mask and an explicit vector length parameter.

Arguments:

The ‘llvm.vp.sext’ intrinsic takes a value to cast as its first argument.The return type is the type to cast the value to. Both types must be vectors ofinteger type. The bit size of the value must be smaller thanthe bit size of the return type. The second argument is the vector mask. Thereturn type, the value to cast, and the vector mask have the same number ofelements. The third argument is the explicit vector length of the operation.

Semantics:

The ‘llvm.vp.sext’ intrinsic performs a sign extension by copying the signbit (highest order bit) of the value until it reaches the size of the returntype. When sign extending from i1, the result will always be either -1 or 0.The conversion is performed on lane positions below the explicit vector lengthand where the vector mask is true. Masked-off lanes arepoison.

Examples:
%r=call<4xi32>@llvm.vp.sext.v4i32.v4i16(<4xi16>%a,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=sext<4xi16>%ato<4xi32>%also.r=select<4xi1>%mask,<4xi32>%t,<4xi32>poison

llvm.vp.fptrunc.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xfloat>@llvm.vp.fptrunc.v16f32.v16f64(<16xdouble><op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xfloat>@llvm.vp.trunc.nxv4f32.nxv4f64(<vscalex4xdouble><op>,<vscalex4xi1><mask>,i32<vector_length>)
Overview:

The ‘llvm.vp.fptrunc’ intrinsic truncates its first argument to the returntype. The operation has a mask and an explicit vector length parameter.

Arguments:

The ‘llvm.vp.fptrunc’ intrinsic takes a value to cast as its first argument.The return type is the type to cast the value to. Both types must be vector offloating-point type. The bit size of the value must belarger than the bit size of the return type. This implies that‘llvm.vp.fptrunc’ cannot be used to make ano-op cast. The second argumentis the vector mask. The return type, the value to cast, and the vector mask havethe same number of elements. The third argument is the explicit vector length ofthe operation.

Semantics:

The ‘llvm.vp.fptrunc’ intrinsic casts avalue from a largerfloating-point type to a smallerfloating-point type.This instruction is assumed to execute in the defaultfloating-pointenvironment. The conversion is performed on lane positions below theexplicit vector length and where the vector mask is true. Masked-off lanes arepoison.

Examples:
%r=call<4xfloat>@llvm.vp.fptrunc.v4f32.v4f64(<4xdouble>%a,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=fptrunc<4xdouble>%ato<4xfloat>%also.r=select<4xi1>%mask,<4xfloat>%t,<4xfloat>poison

llvm.vp.fpext.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xdouble>@llvm.vp.fpext.v16f64.v16f32(<16xfloat><op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xdouble>@llvm.vp.fpext.nxv4f64.nxv4f32(<vscalex4xfloat><op>,<vscalex4xi1><mask>,i32<vector_length>)
Overview:

The ‘llvm.vp.fpext’ intrinsic extends its first argument to the returntype. The operation has a mask and an explicit vector length parameter.

Arguments:

The ‘llvm.vp.fpext’ intrinsic takes a value to cast as its first argument.The return type is the type to cast the value to. Both types must be vector offloating-point type. The bit size of the value must besmaller than the bit size of the return type. This implies that‘llvm.vp.fpext’ cannot be used to make ano-op cast. The second argumentis the vector mask. The return type, the value to cast, and the vector mask havethe same number of elements. The third argument is the explicit vector length ofthe operation.

Semantics:

The ‘llvm.vp.fpext’ intrinsic extends thevalue from a smallerfloating-point type to a largerfloating-point type. The ‘llvm.vp.fpext’ cannot be used to make ano-op cast because it always changes bits. Usebitcast to make ano-op cast for a floating-point cast.The conversion is performed on lane positions below the explicit vector lengthand where the vector mask is true. Masked-off lanes arepoison.

Examples:
%r=call<4xdouble>@llvm.vp.fpext.v4f64.v4f32(<4xfloat>%a,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=fpext<4xfloat>%ato<4xdouble>%also.r=select<4xi1>%mask,<4xdouble>%t,<4xdouble>poison

llvm.vp.fptoui.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.fptoui.v16i32.v16f32(<16xfloat><op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi32>@llvm.vp.fptoui.nxv4i32.nxv4f32(<vscalex4xfloat><op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xi64>@llvm.vp.fptoui.v256i64.v256f64(<256xdouble><op>,<256xi1><mask>,i32<vector_length>)
Overview:

The ‘llvm.vp.fptoui’ intrinsic converts thefloating-point argument to the unsigned integer return type.The operation has a mask and an explicit vector length parameter.

Arguments:

The ‘llvm.vp.fptoui’ intrinsic takes a value to cast as its first argument.The value to cast must be a vector offloating-point type.The return type is the type to cast the value to. The return type must bevector ofinteger type. The second argument is the vectormask. The return type, the value to cast, and the vector mask have the samenumber of elements. The third argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.fptoui’ intrinsic converts itsfloating-point argument into the nearest (rounding towards zero) unsigned integervalue where the lane position is below the explicit vector length and thevector mask is true. Masked-off lanes arepoison. On enabled lanes whereconversion takes place and the value cannot fit in the return type, the resulton that lane is apoison value.

Examples:
%r=call<4xi32>@llvm.vp.fptoui.v4i32.v4f32(<4xfloat>%a,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=fptoui<4xfloat>%ato<4xi32>%also.r=select<4xi1>%mask,<4xi32>%t,<4xi32>poison

llvm.vp.fptosi.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.fptosi.v16i32.v16f32(<16xfloat><op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi32>@llvm.vp.fptosi.nxv4i32.nxv4f32(<vscalex4xfloat><op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xi64>@llvm.vp.fptosi.v256i64.v256f64(<256xdouble><op>,<256xi1><mask>,i32<vector_length>)
Overview:

The ‘llvm.vp.fptosi’ intrinsic converts thefloating-point argument to the signed integer return type.The operation has a mask and an explicit vector length parameter.

Arguments:

The ‘llvm.vp.fptosi’ intrinsic takes a value to cast as its first argument.The value to cast must be a vector offloating-point type.The return type is the type to cast the value to. The return type must bevector ofinteger type. The second argument is the vectormask. The return type, the value to cast, and the vector mask have the samenumber of elements. The third argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.fptosi’ intrinsic converts itsfloating-point argument into the nearest (rounding towards zero) signed integervalue where the lane position is below the explicit vector length and thevector mask is true. Masked-off lanes arepoison. On enabled lanes whereconversion takes place and the value cannot fit in the return type, the resulton that lane is apoison value.

Examples:
%r=call<4xi32>@llvm.vp.fptosi.v4i32.v4f32(<4xfloat>%a,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=fptosi<4xfloat>%ato<4xi32>%also.r=select<4xi1>%mask,<4xi32>%t,<4xi32>poison

llvm.vp.uitofp.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xfloat>@llvm.vp.uitofp.v16f32.v16i32(<16xi32><op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xfloat>@llvm.vp.uitofp.nxv4f32.nxv4i32(<vscalex4xi32><op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xdouble>@llvm.vp.uitofp.v256f64.v256i64(<256xi64><op>,<256xi1><mask>,i32<vector_length>)
Overview:

The ‘llvm.vp.uitofp’ intrinsic converts its unsigned integer argument to thefloating-point return type. The operation has a mask andan explicit vector length parameter.

Arguments:

The ‘llvm.vp.uitofp’ intrinsic takes a value to cast as its first argument.The value to cast must be vector ofinteger type. Thereturn type is the type to cast the value to. The return type must be a vectoroffloating-point type. The second argument is the vectormask. The return type, the value to cast, and the vector mask have the samenumber of elements. The third argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.uitofp’ intrinsic interprets its first argument as an unsignedinteger quantity and converts it to the corresponding floating-point value. Ifthe value cannot be exactly represented, it is rounded using the defaultrounding mode. The conversion is performed on lane positions below theexplicit vector length and where the vector mask is true. Masked-off lanes arepoison.

Examples:
%r=call<4xfloat>@llvm.vp.uitofp.v4f32.v4i32(<4xi32>%a,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=uitofp<4xi32>%ato<4xfloat>%also.r=select<4xi1>%mask,<4xfloat>%t,<4xfloat>poison

llvm.vp.sitofp.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xfloat>@llvm.vp.sitofp.v16f32.v16i32(<16xi32><op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xfloat>@llvm.vp.sitofp.nxv4f32.nxv4i32(<vscalex4xi32><op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xdouble>@llvm.vp.sitofp.v256f64.v256i64(<256xi64><op>,<256xi1><mask>,i32<vector_length>)
Overview:

The ‘llvm.vp.sitofp’ intrinsic converts its signed integer argument to thefloating-point return type. The operation has a mask andan explicit vector length parameter.

Arguments:

The ‘llvm.vp.sitofp’ intrinsic takes a value to cast as its first argument.The value to cast must be vector ofinteger type. Thereturn type is the type to cast the value to. The return type must be a vectoroffloating-point type. The second argument is the vectormask. The return type, the value to cast, and the vector mask have the samenumber of elements. The third argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.sitofp’ intrinsic interprets its first argument as a signedinteger quantity and converts it to the corresponding floating-point value. Ifthe value cannot be exactly represented, it is rounded using the defaultrounding mode. The conversion is performed on lane positions below theexplicit vector length and where the vector mask is true. Masked-off lanes arepoison.

Examples:
%r=call<4xfloat>@llvm.vp.sitofp.v4f32.v4i32(<4xi32>%a,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=sitofp<4xi32>%ato<4xfloat>%also.r=select<4xi1>%mask,<4xfloat>%t,<4xfloat>poison

llvm.vp.ptrtoint.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi8>@llvm.vp.ptrtoint.v16i8.v16p0(<16xptr><op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi8>@llvm.vp.ptrtoint.nxv4i8.nxv4p0(<vscalex4xptr><op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xi64>@llvm.vp.ptrtoint.v16i64.v16p0(<256xptr><op>,<256xi1><mask>,i32<vector_length>)
Overview:

The ‘llvm.vp.ptrtoint’ intrinsic converts its pointer to the integer returntype. The operation has a mask and an explicit vector length parameter.

Arguments:

The ‘llvm.vp.ptrtoint’ intrinsic takes a value to cast as its first argument, which must be a vector of pointers, and a type to cast it to return type,which must be a vector ofinteger type.The second argument is the vector mask. The return type, the value to cast, andthe vector mask have the same number of elements.The third argument is the explicit vector length of the operation.

Semantics:

The ‘llvm.vp.ptrtoint’ intrinsic converts value to return type byinterpreting the pointer value as an integer and either truncating or zeroextending that value to the size of the integer type.Ifvalue is smaller than return type, then a zero extension is done. Ifvalue is larger than return type, then a truncation is done. If they arethe same size, then nothing is done (no-op cast) other than a typechange.The conversion is performed on lane positions below the explicit vector lengthand where the vector mask is true. Masked-off lanes arepoison.

Examples:
%r=call<4xi8>@llvm.vp.ptrtoint.v4i8.v4p0i32(<4xptr>%a,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=ptrtoint<4xptr>%ato<4xi8>%also.r=select<4xi1>%mask,<4xi8>%t,<4xi8>poison

llvm.vp.inttoptr.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xptr>@llvm.vp.inttoptr.v16p0.v16i32(<16xi32><op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xptr>@llvm.vp.inttoptr.nxv4p0.nxv4i32(<vscalex4xi32><op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xptr>@llvm.vp.inttoptr.v256p0.v256i32(<256xi32><op>,<256xi1><mask>,i32<vector_length>)
Overview:

The ‘llvm.vp.inttoptr’ intrinsic converts its integer value to the pointreturn type. The operation has a mask and an explicit vector length parameter.

Arguments:

The ‘llvm.vp.inttoptr’ intrinsic takes a value to cast as its first argument, which must be a vector ofinteger type, and a type to castit to return type, which must be a vector of pointers type.The second argument is the vector mask. The return type, the value to cast, andthe vector mask have the same number of elements.The third argument is the explicit vector length of the operation.

Semantics:

The ‘llvm.vp.inttoptr’ intrinsic convertsvalue to return type byapplying either a zero extension or a truncation depending on the size of theintegervalue. Ifvalue is larger than the size of a pointer, then atruncation is done. Ifvalue is smaller than the size of a pointer, then azero extension is done. If they are the same size, nothing is done (no-op cast).The conversion is performed on lane positions below the explicit vector lengthand where the vector mask is true. Masked-off lanes arepoison.

Examples:
%r=call<4xptr>@llvm.vp.inttoptr.v4p0i32.v4i32(<4xi32>%a,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=inttoptr<4xi32>%ato<4xptr>%also.r=select<4xi1>%mask,<4xptr>%t,<4xptr>poison

llvm.vp.fcmp.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi1>@llvm.vp.fcmp.v16f32(<16xfloat><left_op>,<16xfloat><right_op>,metadata<conditioncode>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi1>@llvm.vp.fcmp.nxv4f32(<vscalex4xfloat><left_op>,<vscalex4xfloat><right_op>,metadata<conditioncode>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xi1>@llvm.vp.fcmp.v256f64(<256xdouble><left_op>,<256xdouble><right_op>,metadata<conditioncode>,<256xi1><mask>,i32<vector_length>)
Overview:

The ‘llvm.vp.fcmp’ intrinsic returns a vector of boolean values based onthe comparison of its arguments. The operation has a mask and an explicit vectorlength parameter.

Arguments:

The ‘llvm.vp.fcmp’ intrinsic takes the two values to compare as its firstand second arguments. These two values must be vectors offloating-point types.The return type is the result of the comparison. The return type must be avector ofi1 type. The fourth argument is the vector mask.The return type, the values to compare, and the vector mask have the samenumber of elements. The third argument is the condition code indicating the kindof comparison to perform. It must be a metadata string withone of thesupported floating-point condition code values. The fifth argumentis the explicit vector length of the operation.

Semantics:

The ‘llvm.vp.fcmp’ compares its first two arguments according to thecondition code given as the third argument. The arguments are compared element byelement on each enabled lane, where the semantics of the comparison aredefinedaccording to the condition code. Masked-offlanes arepoison.

Examples:
%r=call<4xi1>@llvm.vp.fcmp.v4f32(<4xfloat>%a,<4xfloat>%b,metadata!"oeq",<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=fcmpoeq<4xfloat>%a,%b%also.r=select<4xi1>%mask,<4xi1>%t,<4xi1>poison

llvm.vp.icmp.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<32xi1>@llvm.vp.icmp.v32i32(<32xi32><left_op>,<32xi32><right_op>,metadata<conditioncode>,<32xi1><mask>,i32<vector_length>)declare<vscalex2xi1>@llvm.vp.icmp.nxv2i32(<vscalex2xi32><left_op>,<vscalex2xi32><right_op>,metadata<conditioncode>,<vscalex2xi1><mask>,i32<vector_length>)declare<128xi1>@llvm.vp.icmp.v128i8(<128xi8><left_op>,<128xi8><right_op>,metadata<conditioncode>,<128xi1><mask>,i32<vector_length>)
Overview:

The ‘llvm.vp.icmp’ intrinsic returns a vector of boolean values based onthe comparison of its arguments. The operation has a mask and an explicit vectorlength parameter.

Arguments:

The ‘llvm.vp.icmp’ intrinsic takes the two values to compare as its firstand second arguments. These two values must be vectors ofinteger types.The return type is the result of the comparison. The return type must be avector ofi1 type. The fourth argument is the vector mask.The return type, the values to compare, and the vector mask have the samenumber of elements. The third argument is the condition code indicating the kindof comparison to perform. It must be a metadata string withone of thesupported integer condition code values. The fifth argument is theexplicit vector length of the operation.

Semantics:

The ‘llvm.vp.icmp’ compares its first two arguments according to thecondition code given as the third argument. The arguments are compared element byelement on each enabled lane, where the semantics of the comparison aredefinedaccording to the condition code. Masked-offlanes arepoison.

Examples:
%r=call<4xi1>@llvm.vp.icmp.v4i32(<4xi32>%a,<4xi32>%b,metadata!"ne",<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=icmpne<4xi32>%a,%b%also.r=select<4xi1>%mask,<4xi1>%t,<4xi1>poison

llvm.vp.ceil.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xfloat>@llvm.vp.ceil.v16f32(<16xfloat><op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xfloat>@llvm.vp.ceil.nxv4f32(<vscalex4xfloat><op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xdouble>@llvm.vp.ceil.v256f64(<256xdouble><op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated floating-point ceiling of a vector of floating-point values.

Arguments:

The first argument and the result have the same vector of floating-point type.The second argument is the vector mask and has the same number of elements as theresult vector type. The third argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.ceil’ intrinsic performs floating-point ceiling(ceil) of the first vector argument on each enabled lane. Theresult on disabled lanes is apoison value.

Examples:
%r=call<4xfloat>@llvm.vp.ceil.v4f32(<4xfloat>%a,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=call<4xfloat>@llvm.ceil.v4f32(<4xfloat>%a)%also.r=select<4xi1>%mask,<4xfloat>%t,<4xfloat>poison

llvm.vp.floor.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xfloat>@llvm.vp.floor.v16f32(<16xfloat><op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xfloat>@llvm.vp.floor.nxv4f32(<vscalex4xfloat><op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xdouble>@llvm.vp.floor.v256f64(<256xdouble><op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated floating-point floor of a vector of floating-point values.

Arguments:

The first argument and the result have the same vector of floating-point type.The second argument is the vector mask and has the same number of elements as theresult vector type. The third argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.floor’ intrinsic performs floating-point floor(floor) of the first vector argument on each enabled lane.The result on disabled lanes is apoison value.

Examples:
%r=call<4xfloat>@llvm.vp.floor.v4f32(<4xfloat>%a,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=call<4xfloat>@llvm.floor.v4f32(<4xfloat>%a)%also.r=select<4xi1>%mask,<4xfloat>%t,<4xfloat>poison

llvm.vp.rint.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xfloat>@llvm.vp.rint.v16f32(<16xfloat><op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xfloat>@llvm.vp.rint.nxv4f32(<vscalex4xfloat><op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xdouble>@llvm.vp.rint.v256f64(<256xdouble><op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated floating-point rint of a vector of floating-point values.

Arguments:

The first argument and the result have the same vector of floating-point type.The second argument is the vector mask and has the same number of elements as theresult vector type. The third argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.rint’ intrinsic performs floating-point rint(rint) of the first vector argument on each enabled lane.The result on disabled lanes is apoison value.

Examples:
%r=call<4xfloat>@llvm.vp.rint.v4f32(<4xfloat>%a,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=call<4xfloat>@llvm.rint.v4f32(<4xfloat>%a)%also.r=select<4xi1>%mask,<4xfloat>%t,<4xfloat>poison

llvm.vp.nearbyint.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xfloat>@llvm.vp.nearbyint.v16f32(<16xfloat><op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xfloat>@llvm.vp.nearbyint.nxv4f32(<vscalex4xfloat><op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xdouble>@llvm.vp.nearbyint.v256f64(<256xdouble><op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated floating-point nearbyint of a vector of floating-point values.

Arguments:

The first argument and the result have the same vector of floating-point type.The second argument is the vector mask and has the same number of elements as theresult vector type. The third argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.nearbyint’ intrinsic performs floating-point nearbyint(nearbyint) of the first vector argument on each enabled lane.The result on disabled lanes is apoison value.

Examples:
%r=call<4xfloat>@llvm.vp.nearbyint.v4f32(<4xfloat>%a,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=call<4xfloat>@llvm.nearbyint.v4f32(<4xfloat>%a)%also.r=select<4xi1>%mask,<4xfloat>%t,<4xfloat>poison

llvm.vp.round.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xfloat>@llvm.vp.round.v16f32(<16xfloat><op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xfloat>@llvm.vp.round.nxv4f32(<vscalex4xfloat><op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xdouble>@llvm.vp.round.v256f64(<256xdouble><op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated floating-point round of a vector of floating-point values.

Arguments:

The first argument and the result have the same vector of floating-point type.The second argument is the vector mask and has the same number of elements as theresult vector type. The third argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.round’ intrinsic performs floating-point round(round) of the first vector argument on each enabled lane.The result on disabled lanes is apoison value.

Examples:
%r=call<4xfloat>@llvm.vp.round.v4f32(<4xfloat>%a,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=call<4xfloat>@llvm.round.v4f32(<4xfloat>%a)%also.r=select<4xi1>%mask,<4xfloat>%t,<4xfloat>poison

llvm.vp.roundeven.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xfloat>@llvm.vp.roundeven.v16f32(<16xfloat><op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xfloat>@llvm.vp.roundeven.nxv4f32(<vscalex4xfloat><op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xdouble>@llvm.vp.roundeven.v256f64(<256xdouble><op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated floating-point roundeven of a vector of floating-point values.

Arguments:

The first argument and the result have the same vector of floating-point type.The second argument is the vector mask and has the same number of elements as theresult vector type. The third argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.roundeven’ intrinsic performs floating-point roundeven(roundeven) of the first vector argument on each enabledlane. The result on disabled lanes is apoison value.

Examples:
%r=call<4xfloat>@llvm.vp.roundeven.v4f32(<4xfloat>%a,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=call<4xfloat>@llvm.roundeven.v4f32(<4xfloat>%a)%also.r=select<4xi1>%mask,<4xfloat>%t,<4xfloat>poison

llvm.vp.roundtozero.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xfloat>@llvm.vp.roundtozero.v16f32(<16xfloat><op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xfloat>@llvm.vp.roundtozero.nxv4f32(<vscalex4xfloat><op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xdouble>@llvm.vp.roundtozero.v256f64(<256xdouble><op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated floating-point round-to-zero of a vector of floating-point values.

Arguments:

The first argument and the result have the same vector of floating-point type.The second argument is the vector mask and has the same number of elements as theresult vector type. The third argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.roundtozero’ intrinsic performs floating-point roundeven(llvm.trunc) of the first vector argument on each enabled lane. Theresult on disabled lanes is apoison value.

Examples:
%r=call<4xfloat>@llvm.vp.roundtozero.v4f32(<4xfloat>%a,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=call<4xfloat>@llvm.trunc.v4f32(<4xfloat>%a)%also.r=select<4xi1>%mask,<4xfloat>%t,<4xfloat>poison

llvm.vp.lrint.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.lrint.v16i32.v16f32(<16xfloat><op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi32>@llvm.vp.lrint.nxv4i32.nxv4f32(<vscalex4xfloat><op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xi64>@llvm.vp.lrint.v256i64.v256f64(<256xdouble><op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated lrint of a vector of floating-point values.

Arguments:

The result is an integer vector and the first argument is a vector offloating-pointtype with the same number of elements as the result vector type. The secondargument is the vector mask and has the same number of elements as the resultvector type. The third argument is the explicit vector length of the operation.

Semantics:

The ‘llvm.vp.lrint’ intrinsic performs lrint (lrint) ofthe first vector argument on each enabled lane. The result on disabled lanes is apoison value.

Examples:
%r=call<4xi32>@llvm.vp.lrint.v4i32.v4f32(<4xfloat>%a,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=call<4xi32>@llvm.lrint.v4f32(<4xfloat>%a)%also.r=select<4xi1>%mask,<4xi32>%t,<4xi32>poison

llvm.vp.llrint.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.llrint.v16i32.v16f32(<16xfloat><op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi32>@llvm.vp.llrint.nxv4i32.nxv4f32(<vscalex4xfloat><op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xi64>@llvm.vp.llrint.v256i64.v256f64(<256xdouble><op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated llrint of a vector of floating-point values.

Arguments:

The result is an integer vector and the first argument is a vector offloating-pointtype with the same number of elements as the result vector type. The secondargument is the vector mask and has the same number of elements as the resultvector type. The third argument is the explicit vector length of the operation.

Semantics:

The ‘llvm.vp.llrint’ intrinsic performs lrint (llrint) ofthe first vector argument on each enabled lane. The result on disabled lanes is apoison value.

Examples:
%r=call<4xi32>@llvm.vp.llrint.v4i32.v4f32(<4xfloat>%a,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=call<4xi32>@llvm.llrint.v4f32(<4xfloat>%a)%also.r=select<4xi1>%mask,<4xi32>%t,<4xi32>poison

llvm.vp.bitreverse.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.bitreverse.v16i32(<16xi32><op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi32>@llvm.vp.bitreverse.nxv4i32(<vscalex4xi32><op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xi64>@llvm.vp.bitreverse.v256i64(<256xi64><op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated bitreverse of a vector of integers.

Arguments:

The first argument and the result have the same vector of integer type. Thesecond argument is the vector mask and has the same number of elements as theresult vector type. The third argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.bitreverse’ intrinsic performs bitreverse (bitreverse) of the first argument on eachenabled lane. The result on disabled lanes is apoison value.

Examples:
%r=call<4xi32>@llvm.vp.bitreverse.v4i32(<4xi32>%a,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=call<4xi32>@llvm.bitreverse.v4i32(<4xi32>%a)%also.r=select<4xi1>%mask,<4xi32>%t,<4xi32>poison

llvm.vp.bswap.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.bswap.v16i32(<16xi32><op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi32>@llvm.vp.bswap.nxv4i32(<vscalex4xi32><op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xi64>@llvm.vp.bswap.v256i64(<256xi64><op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated bswap of a vector of integers.

Arguments:

The first argument and the result have the same vector of integer type. Thesecond argument is the vector mask and has the same number of elements as theresult vector type. The third argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.bswap’ intrinsic performs bswap (bswap) of the first argument on eachenabled lane. The result on disabled lanes is apoison value.

Examples:
%r=call<4xi32>@llvm.vp.bswap.v4i32(<4xi32>%a,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=call<4xi32>@llvm.bswap.v4i32(<4xi32>%a)%also.r=select<4xi1>%mask,<4xi32>%t,<4xi32>poison

llvm.vp.ctpop.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.ctpop.v16i32(<16xi32><op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi32>@llvm.vp.ctpop.nxv4i32(<vscalex4xi32><op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xi64>@llvm.vp.ctpop.v256i64(<256xi64><op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated ctpop of a vector of integers.

Arguments:

The first argument and the result have the same vector of integer type. Thesecond argument is the vector mask and has the same number of elements as theresult vector type. The third argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.ctpop’ intrinsic performs ctpop (ctpop) of the first argument on eachenabled lane. The result on disabled lanes is apoison value.

Examples:
%r=call<4xi32>@llvm.vp.ctpop.v4i32(<4xi32>%a,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=call<4xi32>@llvm.ctpop.v4i32(<4xi32>%a)%also.r=select<4xi1>%mask,<4xi32>%t,<4xi32>poison

llvm.vp.ctlz.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.ctlz.v16i32(<16xi32><op>,i1<is_zero_poison>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi32>@llvm.vp.ctlz.nxv4i32(<vscalex4xi32><op>,i1<is_zero_poison>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xi64>@llvm.vp.ctlz.v256i64(<256xi64><op>,i1<is_zero_poison>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated ctlz of a vector of integers.

Arguments:

The first argument and the result have the same vector of integer type. Thesecond argument is a constant flag that indicates whether the intrinsic returnsa valid result if the first argument is zero. The third argument is the vectormask and has the same number of elements as the result vector type. the fourthargument is the explicit vector length of the operation. If the first argumentis zero and the second argument is true, the result is poison.

Semantics:

The ‘llvm.vp.ctlz’ intrinsic performs ctlz (ctlz) of the first argument on eachenabled lane. The result on disabled lanes is apoison value.

Examples:
%r=call<4xi32>@llvm.vp.ctlz.v4i32(<4xi32>%a,i1false,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=call<4xi32>@llvm.ctlz.v4i32(<4xi32>%a,i1false)%also.r=select<4xi1>%mask,<4xi32>%t,<4xi32>poison

llvm.vp.cttz.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.cttz.v16i32(<16xi32><op>,i1<is_zero_poison>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi32>@llvm.vp.cttz.nxv4i32(<vscalex4xi32><op>,i1<is_zero_poison>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xi64>@llvm.vp.cttz.v256i64(<256xi64><op>,i1<is_zero_poison>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated cttz of a vector of integers.

Arguments:

The first argument and the result have the same vector of integer type. Thesecond argument is a constant flag that indicates whether the intrinsicreturns a valid result if the first argument is zero. The third argument isthe vector mask and has the same number of elements as the result vector type.The fourth argument is the explicit vector length of the operation. If thefirst argument is zero and the second argument is true, the result is poison.

Semantics:

The ‘llvm.vp.cttz’ intrinsic performs cttz (cttz) of the first argument on eachenabled lane. The result on disabled lanes is apoison value.

Examples:
%r=call<4xi32>@llvm.vp.cttz.v4i32(<4xi32>%a,i1false,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=call<4xi32>@llvm.cttz.v4i32(<4xi32>%a,i1false)%also.r=select<4xi1>%mask,<4xi32>%t,<4xi32>poison

llvm.vp.cttz.elts.*’ Intrinsics

Syntax:

This is an overloaded intrinsic. You can use`llvm.vp.cttz.elts` on anyvector of integer elements, both fixed width and scalable.

declarei32@llvm.vp.cttz.elts.i32.v16i32(<16xi32><op>,i1<is_zero_poison>,<16xi1><mask>,i32<vector_length>)declarei64@llvm.vp.cttz.elts.i64.nxv4i32(<vscalex4xi32><op>,i1<is_zero_poison>,<vscalex4xi1><mask>,i32<vector_length>)declarei64@llvm.vp.cttz.elts.i64.v256i1(<256xi1><op>,i1<is_zero_poison>,<256xi1><mask>,i32<vector_length>)
Overview:

This ‘`llvm.vp.cttz.elts`’ intrinsic counts the number of trailing zeroelements of a vector. This is basically the vector-predicated version of‘`llvm.experimental.cttz.elts`’.

Arguments:

The first argument is the vector to be counted. This argument must be a vectorwith integer element type. The return type must also be an integer type which iswide enough to hold the maximum number of elements of the source vector. Thebehavior of this intrinsic is undefined if the return type is not wide enoughfor the number of elements in the input vector.

The second argument is a constant flag that indicates whether the intrinsicreturns a valid result if the first argument is all zero.

The third argument is the vector mask and has the same number of elements as theinput vector type. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.cttz.elts’ intrinsic counts the trailing (leastsignificant / lowest-numbered) zero elements in the first argument on eachenabled lane. If the first argument is all zero and the second argument is true,the result is poison. Otherwise, it returns the explicit vector length (i.e. thefourth argument).

llvm.vp.sadd.sat.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.sadd.sat.v16i32(<16xi32><left_op><16xi32><right_op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi32>@llvm.vp.sadd.sat.nxv4i32(<vscalex4xi32><left_op>,<vscalex4xi32><right_op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xi64>@llvm.vp.sadd.sat.v256i64(<256xi64><left_op>,<256xi64><right_op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated signed saturating addition of two vectors of integers.

Arguments:

The first two arguments and the result have the same vector of integer type. Thethird argument is the vector mask and has the same number of elements as theresult vector type. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.sadd.sat’ intrinsic performs sadd.sat (sadd.sat)of the first and second vector arguments on each enabled lane. The result ondisabled lanes is apoison value.

Examples:
%r=call<4xi32>@llvm.vp.sadd.sat.v4i32(<4xi32>%a,<4xi32>%b,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=call<4xi32>@llvm.sadd.sat.v4i32(<4xi32>%a,<4xi32>%b)%also.r=select<4xi1>%mask,<4xi32>%t,<4xi32>poison

llvm.vp.uadd.sat.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.uadd.sat.v16i32(<16xi32><left_op><16xi32><right_op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi32>@llvm.vp.uadd.sat.nxv4i32(<vscalex4xi32><left_op>,<vscalex4xi32><right_op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xi64>@llvm.vp.uadd.sat.v256i64(<256xi64><left_op>,<256xi64><right_op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated unsigned saturating addition of two vectors of integers.

Arguments:

The first two arguments and the result have the same vector of integer type. Thethird argument is the vector mask and has the same number of elements as theresult vector type. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.uadd.sat’ intrinsic performs uadd.sat (uadd.sat)of the first and second vector arguments on each enabled lane. The result ondisabled lanes is apoison value.

Examples:
%r=call<4xi32>@llvm.vp.uadd.sat.v4i32(<4xi32>%a,<4xi32>%b,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=call<4xi32>@llvm.uadd.sat.v4i32(<4xi32>%a,<4xi32>%b)%also.r=select<4xi1>%mask,<4xi32>%t,<4xi32>poison

llvm.vp.ssub.sat.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.ssub.sat.v16i32(<16xi32><left_op><16xi32><right_op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi32>@llvm.vp.ssub.sat.nxv4i32(<vscalex4xi32><left_op>,<vscalex4xi32><right_op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xi64>@llvm.vp.ssub.sat.v256i64(<256xi64><left_op>,<256xi64><right_op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated signed saturating subtraction of two vectors of integers.

Arguments:

The first two arguments and the result have the same vector of integer type. Thethird argument is the vector mask and has the same number of elements as theresult vector type. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.ssub.sat’ intrinsic performs ssub.sat (ssub.sat)of the first and second vector arguments on each enabled lane. The result ondisabled lanes is apoison value.

Examples:
%r=call<4xi32>@llvm.vp.ssub.sat.v4i32(<4xi32>%a,<4xi32>%b,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=call<4xi32>@llvm.ssub.sat.v4i32(<4xi32>%a,<4xi32>%b)%also.r=select<4xi1>%mask,<4xi32>%t,<4xi32>poison

llvm.vp.usub.sat.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.usub.sat.v16i32(<16xi32><left_op><16xi32><right_op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi32>@llvm.vp.usub.sat.nxv4i32(<vscalex4xi32><left_op>,<vscalex4xi32><right_op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xi64>@llvm.vp.usub.sat.v256i64(<256xi64><left_op>,<256xi64><right_op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated unsigned saturating subtraction of two vectors of integers.

Arguments:

The first two arguments and the result have the same vector of integer type. Thethird argument is the vector mask and has the same number of elements as theresult vector type. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.usub.sat’ intrinsic performs usub.sat (usub.sat)of the first and second vector arguments on each enabled lane. The result ondisabled lanes is apoison value.

Examples:
%r=call<4xi32>@llvm.vp.usub.sat.v4i32(<4xi32>%a,<4xi32>%b,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=call<4xi32>@llvm.usub.sat.v4i32(<4xi32>%a,<4xi32>%b)%also.r=select<4xi1>%mask,<4xi32>%t,<4xi32>poison

llvm.vp.fshl.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.fshl.v16i32(<16xi32><left_op>,<16xi32><middle_op>,<16xi32><right_op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi32>@llvm.vp.fshl.nxv4i32(<vscalex4xi32><left_op>,<vscalex4xi32><middle_op>,<vscalex4xi32><right_op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xi64>@llvm.vp.fshl.v256i64(<256xi64><left_op>,<256xi64><middle_op>,<256xi64><right_op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated fshl of three vectors of integers.

Arguments:

The first three arguments and the result have the same vector of integer type. Thefourth argument is the vector mask and has the same number of elements as theresult vector type. The fifth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.fshl’ intrinsic performs fshl (fshl) of the first, second, and thirdvector argument on each enabled lane. The result on disabled lanes is apoison value.

Examples:
%r=call<4xi32>@llvm.vp.fshl.v4i32(<4xi32>%a,<4xi32>%b,<4xi32>%c,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=call<4xi32>@llvm.fshl.v4i32(<4xi32>%a,<4xi32>%b,<4xi32>%c)%also.r=select<4xi1>%mask,<4xi32>%t,<4xi32>poison

llvm.vp.fshr.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<16xi32>@llvm.vp.fshr.v16i32(<16xi32><left_op>,<16xi32><middle_op>,<16xi32><right_op>,<16xi1><mask>,i32<vector_length>)declare<vscalex4xi32>@llvm.vp.fshr.nxv4i32(<vscalex4xi32><left_op>,<vscalex4xi32><middle_op>,<vscalex4xi32><right_op>,<vscalex4xi1><mask>,i32<vector_length>)declare<256xi64>@llvm.vp.fshr.v256i64(<256xi64><left_op>,<256xi64><middle_op>,<256xi64><right_op>,<256xi1><mask>,i32<vector_length>)
Overview:

Predicated fshr of three vectors of integers.

Arguments:

The first three arguments and the result have the same vector of integer type. Thefourth argument is the vector mask and has the same number of elements as theresult vector type. The fifth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.fshr’ intrinsic performs fshr (fshr) of the first, second, and thirdvector argument on each enabled lane. The result on disabled lanes is apoison value.

Examples:
%r=call<4xi32>@llvm.vp.fshr.v4i32(<4xi32>%a,<4xi32>%b,<4xi32>%c,<4xi1>%mask,i32%evl);; For all lanes below %evl, %r is lane-wise equivalent to %also.r%t=call<4xi32>@llvm.fshr.v4i32(<4xi32>%a,<4xi32>%b,<4xi32>%c)%also.r=select<4xi1>%mask,<4xi32>%t,<4xi32>poison

llvm.vp.is.fpclass.*’ Intrinsics

Syntax:

This is an overloaded intrinsic.

declare<vscalex2xi1>@llvm.vp.is.fpclass.nxv2f32(<vscalex2xfloat><op>,i32<test>,<vscalex2xi1><mask>,i32<vector_length>)declare<2xi1>@llvm.vp.is.fpclass.v2f16(<2xhalf><op>,i32<test>,<2xi1><mask>,i32<vector_length>)
Overview:

Predicated llvm.is.fpclassllvm.is.fpclass

Arguments:

The first argument is a floating-point vector, the result type is a vector ofboolean with the same number of elements as the first argument. The secondargument specifies, which tests to performllvm.is.fpclass.The third argument is the vector mask and has the same number of elements as theresult vector type. The fourth argument is the explicit vector length of theoperation.

Semantics:

The ‘llvm.vp.is.fpclass’ intrinsic performs llvm.is.fpclass (llvm.is.fpclass).

Examples:
%r=call<2xi1>@llvm.vp.is.fpclass.v2f16(<2xhalf>%x,i323,<2xi1>%m,i32%evl)%t=call<vscalex2xi1>@llvm.vp.is.fpclass.nxv2f16(<vscalex2xhalf>%x,i323,<vscalex2xi1>%m,i32%evl)

Masked Vector Load and Store Intrinsics

LLVM provides intrinsics for predicated vector load and store operations. The predicate is specified by a mask argument, which holds one bit per vector element, switching the associated vector lane on or off. The memory addresses corresponding to the “off” lanes are not accessed. When all bits of the mask are on, the intrinsic is identical to a regular vector load or store. When all bits are off, no memory is accessed.

llvm.masked.load.*’ Intrinsics

Syntax:

This is an overloaded intrinsic. The loaded data is a vector of any integer, floating-point or pointer data type.

declare<16xfloat>@llvm.masked.load.v16f32.p0(ptr<ptr>,i32<alignment>,<16xi1><mask>,<16xfloat><passthru>)declare<2xdouble>@llvm.masked.load.v2f64.p0(ptr<ptr>,i32<alignment>,<2xi1><mask>,<2xdouble><passthru>);;Thedataisavectorofpointersdeclare<8xptr>@llvm.masked.load.v8p0.p0(ptr<ptr>,i32<alignment>,<8xi1><mask>,<8xptr><passthru>)
Overview:

Reads a vector from memory according to the provided mask. The mask holds a bit for each vector lane, and is used to prevent memory accesses to the masked-off lanes. The masked-off lanes in the result vector are taken from the corresponding lanes of the ‘passthru’ argument.

Arguments:

The first argument is the base pointer for the load. The second argument is the alignment of the source location. It must be a power of two constant integer value. The third argument, mask, is a vector of boolean values with the same number of elements as the return type. The fourth is a pass-through value that is used to fill the masked-off lanes of the result. The return type, underlying type of the base pointer and the type of the ‘passthru’ argument are the same vector types.

Semantics:

The ‘llvm.masked.load’ intrinsic is designed for conditional reading of selected vector elements in a single IR operation. It is useful for targets that support vector masked loads and allows vectorizing predicated basic blocks on these targets. Other targets may support this intrinsic differently, for example by lowering it into a sequence of branches that guard scalar load operations.The result of this operation is equivalent to a regular vector load instruction followed by a ‘select’ between the loaded and the passthru values, predicated on the same mask, except that the masked-off lanes are not accessed.Only the masked-on lanes of the vector need to be inbounds of an allocation (but all these lanes need to be inbounds of the same allocation).In particular, using this intrinsic prevents exceptions on memory accesses to masked-off lanes.Masked-off lanes are also not considered accessed for the purpose of data races ornoalias constraints.

%res=call<16xfloat>@llvm.masked.load.v16f32.p0(ptr%ptr,i324,<16xi1>%mask,<16xfloat>%passthru);;Theresultofthetwofollowinginstructionsisidenticalasidefrompotentialmemoryaccessexception%loadlal=load<16xfloat>,ptr%ptr,align4%res=select<16xi1>%mask,<16xfloat>%loadlal,<16xfloat>%passthru

llvm.masked.store.*’ Intrinsics

Syntax:

This is an overloaded intrinsic. The data stored in memory is a vector of any integer, floating-point or pointer data type.

declarevoid@llvm.masked.store.v8i32.p0(<8xi32><value>,ptr<ptr>,i32<alignment>,<8xi1><mask>)declarevoid@llvm.masked.store.v16f32.p0(<16xfloat><value>,ptr<ptr>,i32<alignment>,<16xi1><mask>);;Thedataisavectorofpointersdeclarevoid@llvm.masked.store.v8p0.p0(<8xptr><value>,ptr<ptr>,i32<alignment>,<8xi1><mask>)
Overview:

Writes a vector to memory according to the provided mask. The mask holds a bit for each vector lane, and is used to prevent memory accesses to the masked-off lanes.

Arguments:

The first argument is the vector value to be written to memory. The second argument is the base pointer for the store, it has the same underlying type as the value argument. The third argument is the alignment of the destination location. It must be a power of two constant integer value. The fourth argument, mask, is a vector of boolean values. The types of the mask and the value argument must have the same number of vector elements.

Semantics:

The ‘llvm.masked.store’ intrinsics is designed for conditional writing of selected vector elements in a single IR operation. It is useful for targets that support vector masked store and allows vectorizing predicated basic blocks on these targets. Other targets may support this intrinsic differently, for example by lowering it into a sequence of branches that guard scalar store operations.The result of this operation is equivalent to a load-modify-store sequence, except that the masked-off lanes are not accessed.Only the masked-on lanes of the vector need to be inbounds of an allocation (but all these lanes need to be inbounds of the same allocation).In particular, using this intrinsic prevents exceptions on memory accesses to masked-off lanes.Masked-off lanes are also not considered accessed for the purpose of data races ornoalias constraints.

callvoid@llvm.masked.store.v16f32.p0(<16xfloat>%value,ptr%ptr,i324,<16xi1>%mask);;Theresultofthefollowinginstructionsisidenticalasidefrompotentialdataracesandmemoryaccessexceptions%oldval=load<16xfloat>,ptr%ptr,align4%res=select<16xi1>%mask,<16xfloat>%value,<16xfloat>%oldvalstore<16xfloat>%res,ptr%ptr,align4

Masked Vector Gather and Scatter Intrinsics

LLVM provides intrinsics for vector gather and scatter operations. They are similar toMasked Vector Load and Store, except they are designed for arbitrary memory accesses, rather than sequential memory accesses. Gather and scatter also employ a mask argument, which holds one bit per vector element, switching the associated vector lane on or off. The memory addresses corresponding to the “off” lanes are not accessed. When all bits are off, no memory is accessed.

llvm.masked.gather.*’ Intrinsics

Syntax:

This is an overloaded intrinsic. The loaded data are multiple scalar values of any integer, floating-point or pointer data type gathered together into one vector.

declare<16xfloat>@llvm.masked.gather.v16f32.v16p0(<16xptr><ptrs>,i32<alignment>,<16xi1><mask>,<16xfloat><passthru>)declare<2xdouble>@llvm.masked.gather.v2f64.v2p1(<2xptraddrspace(1)><ptrs>,i32<alignment>,<2xi1><mask>,<2xdouble><passthru>)declare<8xptr>@llvm.masked.gather.v8p0.v8p0(<8xptr><ptrs>,i32<alignment>,<8xi1><mask>,<8xptr><passthru>)
Overview:

Reads scalar values from arbitrary memory locations and gathers them into one vector. The memory locations are provided in the vector of pointers ‘ptrs’. The memory is accessed according to the provided mask. The mask holds a bit for each vector lane, and is used to prevent memory accesses to the masked-off lanes. The masked-off lanes in the result vector are taken from the corresponding lanes of the ‘passthru’ argument.

Arguments:

The first argument is a vector of pointers which holds all memory addresses to read. The second argument is an alignment of the source addresses. It must be 0 or a power of two constant integer value. The third argument, mask, is a vector of boolean values with the same number of elements as the return type. The fourth is a pass-through value that is used to fill the masked-off lanes of the result. The return type, underlying type of the vector of pointers and the type of the ‘passthru’ argument are the same vector types.

Semantics:

The ‘llvm.masked.gather’ intrinsic is designed for conditional reading of multiple scalar values from arbitrary memory locations in a single IR operation. It is useful for targets that support vector masked gathers and allows vectorizing basic blocks with data and control divergence. Other targets may support this intrinsic differently, for example by lowering it into a sequence of scalar load operations.The semantics of this operation are equivalent to a sequence of conditional scalar loads with subsequent gathering all loaded values into a single vector. The mask restricts memory access to certain lanes and facilitates vectorization of predicated basic blocks.

%res=call<4xdouble>@llvm.masked.gather.v4f64.v4p0(<4xptr>%ptrs,i328,<4xi1><i1true,i1true,i1true,i1true>,<4xdouble>poison);;Thegatherwithall-truemaskisequivalenttothefollowinginstructionsequence%ptr0=extractelement<4xptr>%ptrs,i320%ptr1=extractelement<4xptr>%ptrs,i321%ptr2=extractelement<4xptr>%ptrs,i322%ptr3=extractelement<4xptr>%ptrs,i323%val0=loaddouble,ptr%ptr0,align8%val1=loaddouble,ptr%ptr1,align8%val2=loaddouble,ptr%ptr2,align8%val3=loaddouble,ptr%ptr3,align8%vec0=insertelement<4xdouble>poison,%val0,0%vec01=insertelement<4xdouble>%vec0,%val1,1%vec012=insertelement<4xdouble>%vec01,%val2,2%vec0123=insertelement<4xdouble>%vec012,%val3,3

llvm.masked.scatter.*’ Intrinsics

Syntax:

This is an overloaded intrinsic. The data stored in memory is a vector of any integer, floating-point or pointer data type. Each vector element is stored in an arbitrary memory address. Scatter with overlapping addresses is guaranteed to be ordered from least-significant to most-significant element.

declarevoid@llvm.masked.scatter.v8i32.v8p0(<8xi32><value>,<8xptr><ptrs>,i32<alignment>,<8xi1><mask>)declarevoid@llvm.masked.scatter.v16f32.v16p1(<16xfloat><value>,<16xptraddrspace(1)><ptrs>,i32<alignment>,<16xi1><mask>)declarevoid@llvm.masked.scatter.v4p0.v4p0(<4xptr><value>,<4xptr><ptrs>,i32<alignment>,<4xi1><mask>)
Overview:

Writes each element from the value vector to the corresponding memory address. The memory addresses are represented as a vector of pointers. Writing is done according to the provided mask. The mask holds a bit for each vector lane, and is used to prevent memory accesses to the masked-off lanes.

Arguments:

The first argument is a vector value to be written to memory. The second argument is a vector of pointers, pointing to where the value elements should be stored. It has the same underlying type as the value argument. The third argument is an alignment of the destination addresses. It must be 0 or a power of two constant integer value. The fourth argument, mask, is a vector of boolean values. The types of the mask and the value argument must have the same number of vector elements.

Semantics:

The ‘llvm.masked.scatter’ intrinsics is designed for writing selected vector elements to arbitrary memory addresses in a single IR operation. The operation may be conditional, when not all bits in the mask are switched on. It is useful for targets that support vector masked scatter and allows vectorizing basic blocks with data and control divergence. Other targets may support this intrinsic differently, for example by lowering it into a sequence of branches that guard scalar store operations.

;;Thisinstructionunconditionallystoresdatavectorinmultipleaddressescall@llvm.masked.scatter.v8i32.v8p0(<8xi32>%value,<8xptr>%ptrs,i324,<8xi1><true,true,..true>);;Itisequivalenttoalistofscalarstores%val0=extractelement<8xi32>%value,i320%val1=extractelement<8xi32>%value,i321..%val7=extractelement<8xi32>%value,i327%ptr0=extractelement<8xptr>%ptrs,i320%ptr1=extractelement<8xptr>%ptrs,i321..%ptr7=extractelement<8xptr>%ptrs,i327;;Note:theorderofthefollowingstoresisimportantwhentheyoverlap:storei32%val0,ptr%ptr0,align4storei32%val1,ptr%ptr1,align4..storei32%val7,ptr%ptr7,align4

Masked Vector Expanding Load and Compressing Store Intrinsics

LLVM provides intrinsics for expanding load and compressing store operations. Data selected from a vector according to a mask is stored in consecutive memory addresses (compressed store), and vice-versa (expanding load). These operations effective map to “if (cond.i) a[j++] = v.i” and “if (cond.i) v.i = a[j++]” patterns, respectively. Note that when the mask starts with ‘1’ bits followed by ‘0’ bits, these operations are identical tollvm.masked.store andllvm.masked.load.

llvm.masked.expandload.*’ Intrinsics

Syntax:

This is an overloaded intrinsic. Several values of integer, floating point or pointer data type are loaded from consecutive memory addresses and stored into the elements of a vector according to the mask.

declare<16xfloat>@llvm.masked.expandload.v16f32(ptr<ptr>,<16xi1><mask>,<16xfloat><passthru>)declare<2xi64>@llvm.masked.expandload.v2i64(ptr<ptr>,<2xi1><mask>,<2xi64><passthru>)
Overview:

Reads a number of scalar values sequentially from memory location provided in ‘ptr’ and spreads them in a vector. The ‘mask’ holds a bit for each vector lane. The number of elements read from memory is equal to the number of ‘1’ bits in the mask. The loaded elements are positioned in the destination vector according to the sequence of ‘1’ and ‘0’ bits in the mask. E.g., if the mask vector is ‘10010001’, “expandload” reads 3 values from memory addresses ptr, ptr+1, ptr+2 and places them in lanes 0, 3 and 7 accordingly. The masked-off lanes are filled by elements from the corresponding lanes of the ‘passthru’ argument.

Arguments:

The first argument is the base pointer for the load. It has the same underlying type as the element of the returned vector. The second argument, mask, is a vector of boolean values with the same number of elements as the return type. The third is a pass-through value that is used to fill the masked-off lanes of the result. The return type and the type of the ‘passthru’ argument have the same vector type.

Thealign parameter attribute can be provided for the firstargument. The pointer alignment defaults to 1.

Semantics:

The ‘llvm.masked.expandload’ intrinsic is designed for reading multiple scalar values from adjacent memory addresses into possibly non-adjacent vector lanes. It is useful for targets that support vector expanding loads and allows vectorizing loop with cross-iteration dependency like in the following example:

// In this loop we load from B and spread the elements into array A.double*A,B;int*C;for(inti=0;i<size;++i){if(C[i]!=0)A[i]=B[j++];}
; Load several elements from array B and expand them in a vector.; The number of loaded elements is equal to the number of '1' elements in the Mask.%Tmp=call<8xdouble>@llvm.masked.expandload.v8f64(ptr%Bptr,<8xi1>%Mask,<8xdouble>poison); Store the result in Acallvoid@llvm.masked.store.v8f64.p0(<8xdouble>%Tmp,ptr%Aptr,i328,<8xi1>%Mask); %Bptr should be increased on each iteration according to the number of '1' elements in the Mask.%MaskI=bitcast<8xi1>%Masktoi8%MaskIPopcnt=calli8@llvm.ctpop.i8(i8%MaskI)%MaskI64=zexti8%MaskIPopcnttoi64%BNextInd=addi64%BInd,%MaskI64

Other targets may support this intrinsic differently, for example, by lowering it into a sequence of conditional scalar load operations and shuffles.If all mask elements are ‘1’, the intrinsic behavior is equivalent to the regular unmasked vector load.

llvm.masked.compressstore.*’ Intrinsics

Syntax:

This is an overloaded intrinsic. A number of scalar values of integer, floating point or pointer data type are collected from an input vector and stored into adjacent memory addresses. A mask defines which elements to collect from the vector.

declarevoid@llvm.masked.compressstore.v8i32(<8xi32><value>,ptr<ptr>,<8xi1><mask>)declarevoid@llvm.masked.compressstore.v16f32(<16xfloat><value>,ptr<ptr>,<16xi1><mask>)
Overview:

Selects elements from input vector ‘value’ according to the ‘mask’. All selected elements are written into adjacent memory addresses starting at address ‘ptr’, from lower to higher. The mask holds a bit for each vector lane, and is used to select elements to be stored. The number of elements to be stored is equal to the number of active bits in the mask.

Arguments:

The first argument is the input vector, from which elements are collected and written to memory. The second argument is the base pointer for the store, it has the same underlying type as the element of the input vector argument. The third argument is the mask, a vector of boolean values. The mask and the input vector must have the same number of vector elements.

Thealign parameter attribute can be provided for the secondargument. The pointer alignment defaults to 1.

Semantics:

The ‘llvm.masked.compressstore’ intrinsic is designed for compressing data in memory. It allows to collect elements from possibly non-adjacent lanes of a vector and store them contiguously in memory in one IR operation. It is useful for targets that support compressing store operations and allows vectorizing loops with cross-iteration dependencies like in the following example:

// In this loop we load elements from A and store them consecutively in Bdouble*A,B;int*C;for(inti=0;i<size;++i){if(C[i]!=0)B[j++]=A[i]}
; Load elements from A.%Tmp=call<8xdouble>@llvm.masked.load.v8f64.p0(ptr%Aptr,i328,<8xi1>%Mask,<8xdouble>poison); Store all selected elements consecutively in array Bcall<void>@llvm.masked.compressstore.v8f64(<8xdouble>%Tmp,ptr%Bptr,<8xi1>%Mask); %Bptr should be increased on each iteration according to the number of '1' elements in the Mask.%MaskI=bitcast<8xi1>%Masktoi8%MaskIPopcnt=calli8@llvm.ctpop.i8(i8%MaskI)%MaskI64=zexti8%MaskIPopcnttoi64%BNextInd=addi64%BInd,%MaskI64

Other targets may support this intrinsic differently, for example, by lowering it into a sequence of branches that guard scalar store operations.

Memory Use Markers

This class of intrinsics provides information about thelifetime of allocated objects and ranges where variablesare immutable.

llvm.lifetime.start’ Intrinsic

Syntax:
declarevoid@llvm.lifetime.start(i64<size>,ptrcaptures(none)<ptr>)
Overview:

The ‘llvm.lifetime.start’ intrinsic specifies the start of a memoryobject’s lifetime.

Arguments:

The first argument is a constant integer representing the size of theobject, or -1 if it is variable sized. The second argument is a pointerto the object.

Semantics:

Ifptr is a stack-allocated object and it points to the first byte ofthe object, the object is initially marked as dead.ptr is conservatively considered as a non-stack-allocated object ifthe stack coloring algorithm that is used in the optimization pipeline cannotconclude thatptr is a stack-allocated object.

After ‘llvm.lifetime.start’, the stack object thatptr points is markedas alive and has an uninitialized value.The stack object is marked as dead when eitherllvm.lifetime.end to the alloca is executed or thefunction returns.

Afterllvm.lifetime.end is called,‘llvm.lifetime.start’ on the stack object can be called again.The second ‘llvm.lifetime.start’ call marks the object as alive, but itdoes not change the address of the object.

Ifptr is a non-stack-allocated object, it does not point to the firstbyte of the object or it is a stack object that is already alive, it simplyfills all bytes of the object withpoison.

llvm.lifetime.end’ Intrinsic

Syntax:
declarevoid@llvm.lifetime.end(i64<size>,ptrcaptures(none)<ptr>)
Overview:

The ‘llvm.lifetime.end’ intrinsic specifies the end of aallocated object’s lifetime.

Arguments:

The first argument is a constant integer representing the size of theobject, or -1 if it is variable sized. The second argument is a pointerto the object.

Semantics:

Ifptr is a stack-allocated object and it points to the first byte of theobject, the object is dead.ptr is conservatively considered as a non-stack-allocated object ifthe stack coloring algorithm that is used in the optimization pipeline cannotconclude thatptr is a stack-allocated object.

Callingllvm.lifetime.end on an already dead alloca is no-op.

Ifptr is a non-stack-allocated object or it does not point to the firstbyte of the object, it is equivalent to simply filling all bytes of the objectwithpoison.

llvm.invariant.start’ Intrinsic

Syntax:

This is an overloaded intrinsic. Theallocated objectcan belong to any address space.

declareptr@llvm.invariant.start.p0(i64<size>,ptrcaptures(none)<ptr>)
Overview:

The ‘llvm.invariant.start’ intrinsic specifies that the contents ofanallocated object will not change.

Arguments:

The first argument is a constant integer representing the size of theobject, or -1 if it is variable sized. The second argument is a pointerto the object.

Semantics:

This intrinsic indicates that until anllvm.invariant.end that usesthe return value, the referenced memory location is constant andunchanging.

llvm.invariant.end’ Intrinsic

Syntax:

This is an overloaded intrinsic. Theallocated objectcan belong to any address space.

declarevoid@llvm.invariant.end.p0(ptr<start>,i64<size>,ptrcaptures(none)<ptr>)
Overview:

The ‘llvm.invariant.end’ intrinsic specifies that the contents of anallocated object are mutable.

Arguments:

The first argument is the matchingllvm.invariant.start intrinsic.The second argument is a constant integer representing the size of theobject, or -1 if it is variable sized and the third argument is apointer to the object.

Semantics:

This intrinsic indicates that the memory is mutable again.

llvm.launder.invariant.group’ Intrinsic

Syntax:

This is an overloaded intrinsic. Theallocated objectcan belong to any address space. The returned pointer must belong to the sameaddress space as the argument.

declareptr@llvm.launder.invariant.group.p0(ptr<ptr>)
Overview:

The ‘llvm.launder.invariant.group’ intrinsic can be used when an invariantestablished byinvariant.group metadata no longer holds, to obtain a newpointer value that carries fresh invariant group information. It is anexperimental intrinsic, which means that its semantics might change in thefuture.

Arguments:

Thellvm.launder.invariant.group takes only one argument, which is a pointerto the memory.

Semantics:

Returns another pointer that aliases its argument but which is considered differentfor the purposes ofload/storeinvariant.group metadata.It does not read any accessible memory and the execution can be speculated.

llvm.strip.invariant.group’ Intrinsic

Syntax:

This is an overloaded intrinsic. Theallocated objectcan belong to any address space. The returned pointer must belong to the sameaddress space as the argument.

declareptr@llvm.strip.invariant.group.p0(ptr<ptr>)
Overview:

The ‘llvm.strip.invariant.group’ intrinsic can be used when an invariantestablished byinvariant.group metadata no longer holds, to obtain a new pointervalue that does not carry the invariant information. It is an experimentalintrinsic, which means that its semantics might change in the future.

Arguments:

Thellvm.strip.invariant.group takes only one argument, which is a pointerto the memory.

Semantics:

Returns another pointer that aliases its argument but which has no associatedinvariant.group metadata.It does not read any memory and can be speculated.

Constrained Floating-Point Intrinsics

These intrinsics are used to provide special handling of floating-pointoperations when specific rounding mode or floating-point exception behavior isrequired. By default, LLVM optimization passes assume that the rounding mode isround-to-nearest and that floating-point exceptions will not be monitored.Constrained FP intrinsics are used to support non-default rounding modes andaccurately preserve exception behavior without compromising LLVM’s ability tooptimize FP code when the default behavior is used.

If any FP operation in a function is constrained then they all must beconstrained. This is required for correct LLVM IR. Optimizations thatmove code around can create miscompiles if mixing of constrained and normaloperations is done. The correct way to mix constrained and less constrainedoperations is to use the rounding mode and exception handling metadata tomark constrained intrinsics as having LLVM’s default behavior.

Each of these intrinsics corresponds to a normal floating-point operation. Thedata arguments and the return value are the same as the corresponding FPoperation.

The rounding mode argument is a metadata string specifying whatassumptions, if any, the optimizer can make when transforming constantvalues. Some constrained FP intrinsics omit this argument. If requiredby the intrinsic, this argument must be one of the following strings:

"round.dynamic""round.tonearest""round.downward""round.upward""round.towardzero""round.tonearestaway"

If this argument is “round.dynamic” optimization passes must assume that therounding mode is unknown and may change at runtime. No transformations thatdepend on rounding mode may be performed in this case.

The other possible values for the rounding mode argument correspond to thesimilarly named IEEE rounding modes. If the argument is any of these valuesoptimization passes may perform transformations as long as they are consistentwith the specified rounding mode.

For example, ‘x-0’->’x’ is not a valid transformation if the rounding mode is“round.downward” or “round.dynamic” because if the value of ‘x’ is +0 then‘x-0’ should evaluate to ‘-0’ when rounding downward. However, thistransformation is legal for all other rounding modes.

For values other than “round.dynamic” optimization passes may assume that theactual runtime rounding mode (as defined in a target-specific manner) matchesthe specified rounding mode, but this is not guaranteed. Using a specificnon-dynamic rounding mode which does not match the actual rounding mode atruntime results in undefined behavior.

The exception behavior argument is a metadata string describing the floatingpoint exception semantics that required for the intrinsic. This argumentmust be one of the following strings:

"fpexcept.ignore""fpexcept.maytrap""fpexcept.strict"

If this argument is “fpexcept.ignore” optimization passes may assume that theexception status flags will not be read and that floating-point exceptions willbe masked. This allows transformations to be performed that may change theexception semantics of the original code. For example, FP operations may bespeculatively executed in this case whereas they must not be for either of theother possible values of this argument.

If the exception behavior argument is “fpexcept.maytrap” optimization passesmust avoid transformations that may raise exceptions that would not have beenraised by the original code (such as speculatively executing FP operations), butpasses are not required to preserve all exceptions that are implied by theoriginal code. For example, exceptions may be potentially hidden by constantfolding.

If the exception behavior argument is “fpexcept.strict” all transformations muststrictly preserve the floating-point exception semantics of the original code.Any FP exception that would have been raised by the original code must be raisedby the transformed code, and the transformed code must not raise any FPexceptions that would not have been raised by the original code. This is theexception behavior argument that will be used if the code being compiled readsthe FP exception status flags, but this mode can also be used with code thatunmasks FP exceptions.

The number and order of floating-point exceptions is NOT guaranteed. Forexample, a series of FP operations that each may raise exceptions may bevectorized into a single instruction that raises each unique exception a singletime.

Properfunction attributes usage is required for theconstrained intrinsics to function correctly.

All functioncalls done in a function that uses constrained floatingpoint intrinsics must have thestrictfp attribute either on thecalling instruction or on the declaration or definition of the functionbeing called.

All functiondefinitions that use constrained floating point intrinsicsmust have thestrictfp attribute.

llvm.experimental.constrained.fadd’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.fadd(<type><op1>,<type><op2>,metadata<roundingmode>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.fadd’ intrinsic returns the sum of itstwo arguments.

Arguments:

The first two arguments to the ‘llvm.experimental.constrained.fadd’intrinsic must befloating-point orvectorof floating-point values. Both arguments must have identical types.

The third and fourth arguments specify the rounding mode and exceptionbehavior as described above.

Semantics:

The value produced is the floating-point sum of the two value arguments and hasthe same type as the arguments.

llvm.experimental.constrained.fsub’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.fsub(<type><op1>,<type><op2>,metadata<roundingmode>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.fsub’ intrinsic returns the differenceof its two arguments.

Arguments:

The first two arguments to the ‘llvm.experimental.constrained.fsub’intrinsic must befloating-point orvectorof floating-point values. Both arguments must have identical types.

The third and fourth arguments specify the rounding mode and exceptionbehavior as described above.

Semantics:

The value produced is the floating-point difference of the two value argumentsand has the same type as the arguments.

llvm.experimental.constrained.fmul’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.fmul(<type><op1>,<type><op2>,metadata<roundingmode>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.fmul’ intrinsic returns the product ofits two arguments.

Arguments:

The first two arguments to the ‘llvm.experimental.constrained.fmul’intrinsic must befloating-point orvectorof floating-point values. Both arguments must have identical types.

The third and fourth arguments specify the rounding mode and exceptionbehavior as described above.

Semantics:

The value produced is the floating-point product of the two value arguments andhas the same type as the arguments.

llvm.experimental.constrained.fdiv’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.fdiv(<type><op1>,<type><op2>,metadata<roundingmode>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.fdiv’ intrinsic returns the quotient ofits two arguments.

Arguments:

The first two arguments to the ‘llvm.experimental.constrained.fdiv’intrinsic must befloating-point orvectorof floating-point values. Both arguments must have identical types.

The third and fourth arguments specify the rounding mode and exceptionbehavior as described above.

Semantics:

The value produced is the floating-point quotient of the two value arguments andhas the same type as the arguments.

llvm.experimental.constrained.frem’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.frem(<type><op1>,<type><op2>,metadata<roundingmode>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.frem’ intrinsic returns the remainderfrom the division of its two arguments.

Arguments:

The first two arguments to the ‘llvm.experimental.constrained.frem’intrinsic must befloating-point orvectorof floating-point values. Both arguments must have identical types.

The third and fourth arguments specify the rounding mode and exceptionbehavior as described above. The rounding mode argument has no effect, sincethe result of frem is never rounded, but the argument is included forconsistency with the other constrained floating-point intrinsics.

Semantics:

The value produced is the floating-point remainder from the division of the twovalue arguments and has the same type as the arguments. The remainder has thesame sign as the dividend.

llvm.experimental.constrained.fma’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.fma(<type><op1>,<type><op2>,<type><op3>,metadata<roundingmode>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.fma’ intrinsic returns the result of afused-multiply-add operation on its arguments.

Arguments:

The first three arguments to the ‘llvm.experimental.constrained.fma’intrinsic must befloating-point orvector of floating-point values. All arguments must have identical types.

The fourth and fifth arguments specify the rounding mode and exception behavioras described above.

Semantics:

The result produced is the product of the first two arguments added to the thirdargument computed with infinite precision, and then rounded to the targetprecision.

llvm.experimental.constrained.fptoui’ Intrinsic

Syntax:
declare<ty2>@llvm.experimental.constrained.fptoui(<type><value>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.fptoui’ intrinsic converts afloating-pointvalue to its unsigned integer equivalent of typety2.

Arguments:

The first argument to the ‘llvm.experimental.constrained.fptoui’intrinsic must befloating point orvector of floating point values.

The second argument specifies the exception behavior as described above.

Semantics:

The result produced is an unsigned integer converted from the floatingpoint argument. The value is truncated, so it is rounded towards zero.

llvm.experimental.constrained.fptosi’ Intrinsic

Syntax:
declare<ty2>@llvm.experimental.constrained.fptosi(<type><value>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.fptosi’ intrinsic convertsfloating-pointvalue to typety2.

Arguments:

The first argument to the ‘llvm.experimental.constrained.fptosi’intrinsic must befloating point orvector of floating point values.

The second argument specifies the exception behavior as described above.

Semantics:

The result produced is a signed integer converted from the floatingpoint argument. The value is truncated, so it is rounded towards zero.

llvm.experimental.constrained.uitofp’ Intrinsic

Syntax:
declare<ty2>@llvm.experimental.constrained.uitofp(<type><value>,metadata<roundingmode>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.uitofp’ intrinsic converts anunsigned integervalue to a floating-point of typety2.

Arguments:

The first argument to the ‘llvm.experimental.constrained.uitofp’intrinsic must be aninteger orvector of integer values.

The second and third arguments specify the rounding mode and exceptionbehavior as described above.

Semantics:

An inexact floating-point exception will be raised if rounding is required.Any result produced is a floating point value converted from the inputinteger argument.

llvm.experimental.constrained.sitofp’ Intrinsic

Syntax:
declare<ty2>@llvm.experimental.constrained.sitofp(<type><value>,metadata<roundingmode>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.sitofp’ intrinsic converts asigned integervalue to a floating-point of typety2.

Arguments:

The first argument to the ‘llvm.experimental.constrained.sitofp’intrinsic must be aninteger orvector of integer values.

The second and third arguments specify the rounding mode and exceptionbehavior as described above.

Semantics:

An inexact floating-point exception will be raised if rounding is required.Any result produced is a floating point value converted from the inputinteger argument.

llvm.experimental.constrained.fptrunc’ Intrinsic

Syntax:
declare<ty2>@llvm.experimental.constrained.fptrunc(<type><value>,metadata<roundingmode>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.fptrunc’ intrinsic truncatesvalueto typety2.

Arguments:

The first argument to the ‘llvm.experimental.constrained.fptrunc’intrinsic must befloating point orvector of floating point values. This argument must be larger in sizethan the result.

The second and third arguments specify the rounding mode and exceptionbehavior as described above.

Semantics:

The result produced is a floating point value truncated to be smaller in sizethan the argument.

llvm.experimental.constrained.fpext’ Intrinsic

Syntax:
declare<ty2>@llvm.experimental.constrained.fpext(<type><value>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.fpext’ intrinsic extends afloating-pointvalue to a larger floating-point value.

Arguments:

The first argument to the ‘llvm.experimental.constrained.fpext’intrinsic must befloating point orvector of floating point values. This argument must be smaller in sizethan the result.

The second argument specifies the exception behavior as described above.

Semantics:

The result produced is a floating point value extended to be larger in sizethan the argument. All restrictions that apply to the fpext instruction alsoapply to this intrinsic.

llvm.experimental.constrained.fcmp’ and ‘llvm.experimental.constrained.fcmps’ Intrinsics

Syntax:
declare<ty2>@llvm.experimental.constrained.fcmp(<type><op1>,<type><op2>,metadata<conditioncode>,metadata<exceptionbehavior>)declare<ty2>@llvm.experimental.constrained.fcmps(<type><op1>,<type><op2>,metadata<conditioncode>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.fcmp’ and‘llvm.experimental.constrained.fcmps’ intrinsics return a booleanvalue or vector of boolean values based on comparison of its arguments.

If the arguments are floating-point scalars, then the result type is aboolean (i1).

If the arguments are floating-point vectors, then the result type is avector of boolean with the same number of elements as the arguments beingcompared.

The ‘llvm.experimental.constrained.fcmp’ intrinsic performs a quietcomparison operation while the ‘llvm.experimental.constrained.fcmps’intrinsic performs a signaling comparison operation.

Arguments:

The first two arguments to the ‘llvm.experimental.constrained.fcmp’and ‘llvm.experimental.constrained.fcmps’ intrinsics must befloating-point orvectorof floating-point values. Both arguments must have identical types.

The third argument is the condition code indicating the kind of comparisonto perform. It must be a metadata string with one of the following values:

  • oeq”: ordered and equal

  • ogt”: ordered and greater than

  • oge”: ordered and greater than or equal

  • olt”: ordered and less than

  • ole”: ordered and less than or equal

  • one”: ordered and not equal

  • ord”: ordered (no nans)

  • ueq”: unordered or equal

  • ugt”: unordered or greater than

  • uge”: unordered or greater than or equal

  • ult”: unordered or less than

  • ule”: unordered or less than or equal

  • une”: unordered or not equal

  • uno”: unordered (either nans)

Ordered means that neither argument is a NAN whileunordered meansthat either argument may be a NAN.

The fourth argument specifies the exception behavior as described above.

Semantics:

op1 andop2 are compared according to the condition code givenas the third argument. If the arguments are vectors, then thevectors are compared element by element. Each comparison performedalways yields ani1 result, as follows:

  • oeq”: yieldstrue if both arguments are not a NAN andop1is equal toop2.

  • ogt”: yieldstrue if both arguments are not a NAN andop1is greater thanop2.

  • oge”: yieldstrue if both arguments are not a NAN andop1is greater than or equal toop2.

  • olt”: yieldstrue if both arguments are not a NAN andop1is less thanop2.

  • ole”: yieldstrue if both arguments are not a NAN andop1is less than or equal toop2.

  • one”: yieldstrue if both arguments are not a NAN andop1is not equal toop2.

  • ord”: yieldstrue if both arguments are not a NAN.

  • ueq”: yieldstrue if either argument is a NAN orop1 isequal toop2.

  • ugt”: yieldstrue if either argument is a NAN orop1 isgreater thanop2.

  • uge”: yieldstrue if either argument is a NAN orop1 isgreater than or equal toop2.

  • ult”: yieldstrue if either argument is a NAN orop1 isless thanop2.

  • ule”: yieldstrue if either argument is a NAN orop1 isless than or equal toop2.

  • une”: yieldstrue if either argument is a NAN orop1 isnot equal toop2.

  • uno”: yieldstrue if either argument is a NAN.

The quiet comparison operation performed by‘llvm.experimental.constrained.fcmp’ will only raise an exceptionif either argument is a SNAN. The signaling comparison operationperformed by ‘llvm.experimental.constrained.fcmps’ will raise anexception if either argument is a NAN (QNAN or SNAN). Such an exceptiondoes not preclude a result being produced (e.g. exception might onlyset a flag), therefore the distinction between ordered and unorderedcomparisons is also relevant for the‘llvm.experimental.constrained.fcmps’ intrinsic.

llvm.experimental.constrained.fmuladd’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.fmuladd(<type><op1>,<type><op2>,<type><op3>,metadata<roundingmode>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.fmuladd’ intrinsic representsmultiply-add expressions that can be fused if the code generator determinesthat (a) the target instruction set has support for a fused operation,and (b) that the fused operation is more efficient than the equivalent,separate pair of mul and add instructions.

Arguments:

The first three arguments to the ‘llvm.experimental.constrained.fmuladd’intrinsic must be floating-point or vector of floating-point values.All three arguments must have identical types.

The fourth and fifth arguments specify the rounding mode and exception behavioras described above.

Semantics:

The expression:

%0=callfloat@llvm.experimental.constrained.fmuladd.f32(%a,%b,%c,metadata<roundingmode>,metadata<exceptionbehavior>)

is equivalent to the expression:

%0=callfloat@llvm.experimental.constrained.fmul.f32(%a,%b,metadata<roundingmode>,metadata<exceptionbehavior>)%1=callfloat@llvm.experimental.constrained.fadd.f32(%0,%c,metadata<roundingmode>,metadata<exceptionbehavior>)

except that it is unspecified whether rounding will be performed between themultiplication and addition steps. Fusion is not guaranteed, even if the targetplatform supports it.If a fused multiply-add is required, the correspondingllvm.experimental.constrained.fma intrinsic function should beused instead.This never sets errno, just as ‘llvm.experimental.constrained.fma.*’.

Constrained libm-equivalent Intrinsics

In addition to the basic floating-point operations for which constrainedintrinsics are described above, there are constrained versions of variousoperations which provide equivalent behavior to a corresponding libm function.These intrinsics allow the precise behavior of these operations with respect torounding mode and exception behavior to be controlled.

As with the basic constrained floating-point intrinsics, the rounding modeand exception behavior arguments only control the behavior of the optimizer.They do not change the runtime floating-point environment.

llvm.experimental.constrained.sqrt’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.sqrt(<type><op1>,metadata<roundingmode>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.sqrt’ intrinsic returns the square rootof the specified value, returning the same value as the libm ‘sqrt’functions would, but without settingerrno.

Arguments:

The first argument and the return type are floating-point numbers of the sametype.

The second and third arguments specify the rounding mode and exceptionbehavior as described above.

Semantics:

This function returns the nonnegative square root of the specified value.If the value is less than negative zero, a floating-point exception occursand the return value is architecture specific.

llvm.experimental.constrained.pow’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.pow(<type><op1>,<type><op2>,metadata<roundingmode>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.pow’ intrinsic returns the first argumentraised to the (positive or negative) power specified by the second argument.

Arguments:

The first two arguments and the return value are floating-point numbers of thesame type. The second argument specifies the power to which the first argumentshould be raised.

The third and fourth arguments specify the rounding mode and exceptionbehavior as described above.

Semantics:

This function returns the first value raised to the second power,returning the same values as the libmpow functions would, andhandles error conditions in the same way.

llvm.experimental.constrained.powi’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.powi(<type><op1>,i32<op2>,metadata<roundingmode>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.powi’ intrinsic returns the first argumentraised to the (positive or negative) power specified by the second argument. Theorder of evaluation of multiplications is not defined. When a vector offloating-point type is used, the second argument remains a scalar integer value.

Arguments:

The first argument and the return value are floating-point numbers of the sametype. The second argument is a 32-bit signed integer specifying the power towhich the first argument should be raised.

The third and fourth arguments specify the rounding mode and exceptionbehavior as described above.

Semantics:

This function returns the first value raised to the second power with anunspecified sequence of rounding operations.

llvm.experimental.constrained.ldexp’ Intrinsic

Syntax:
declare<type0>@llvm.experimental.constrained.ldexp(<type0><op1>,<type1><op2>,metadata<roundingmode>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.ldexp’ performs the ldexp function.

Arguments:

The first argument and the return value arefloating-point orvector of floating-point values ofthe same type. The second argument is an integer with the same numberof elements.

The third and fourth arguments specify the rounding mode and exceptionbehavior as described above.

Semantics:

This function multiplies the first argument by 2 raised to the secondargument’s power. If the first argument is NaN or infinite, the samevalue is returned. If the result underflows a zero with the same signis returned. If the result overflows, the result is an infinity withthe same sign.

llvm.experimental.constrained.sin’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.sin(<type><op1>,metadata<roundingmode>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.sin’ intrinsic returns the sine of thefirst argument.

Arguments:

The first argument and the return type are floating-point numbers of the sametype.

The second and third arguments specify the rounding mode and exceptionbehavior as described above.

Semantics:

This function returns the sine of the specified argument, returning thesame values as the libmsin functions would, and handles errorconditions in the same way.

llvm.experimental.constrained.cos’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.cos(<type><op1>,metadata<roundingmode>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.cos’ intrinsic returns the cosine of thefirst argument.

Arguments:

The first argument and the return type are floating-point numbers of the sametype.

The second and third arguments specify the rounding mode and exceptionbehavior as described above.

Semantics:

This function returns the cosine of the specified argument, returning thesame values as the libmcos functions would, and handles errorconditions in the same way.

llvm.experimental.constrained.tan’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.tan(<type><op1>,metadata<roundingmode>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.tan’ intrinsic returns the tangent of thefirst argument.

Arguments:

The first argument and the return type are floating-point numbers of the sametype.

The second and third arguments specify the rounding mode and exceptionbehavior as described above.

Semantics:

This function returns the tangent of the specified argument, returning thesame values as the libmtan functions would, and handles errorconditions in the same way.

llvm.experimental.constrained.asin’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.asin(<type><op1>,metadata<roundingmode>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.asin’ intrinsic returns the arcsine of thefirst operand.

Arguments:

The first argument and the return type are floating-point numbers of the sametype.

The second and third arguments specify the rounding mode and exceptionbehavior as described above.

Semantics:

This function returns the arcsine of the specified operand, returning thesame values as the libmasin functions would, and handles errorconditions in the same way.

llvm.experimental.constrained.acos’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.acos(<type><op1>,metadata<roundingmode>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.acos’ intrinsic returns the arccosine of thefirst operand.

Arguments:

The first argument and the return type are floating-point numbers of the sametype.

The second and third arguments specify the rounding mode and exceptionbehavior as described above.

Semantics:

This function returns the arccosine of the specified operand, returning thesame values as the libmacos functions would, and handles errorconditions in the same way.

llvm.experimental.constrained.atan’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.atan(<type><op1>,metadata<roundingmode>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.atan’ intrinsic returns the arctangent of thefirst operand.

Arguments:

The first argument and the return type are floating-point numbers of the sametype.

The second and third arguments specify the rounding mode and exceptionbehavior as described above.

Semantics:

This function returns the arctangent of the specified operand, returning thesame values as the libmatan functions would, and handles errorconditions in the same way.

llvm.experimental.constrained.atan2’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.atan2(<type><op1>,<type><op2>,metadata<roundingmode>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.atan2’ intrinsic returns the arctangentof<op1> divided by<op2> accounting for the quadrant.

Arguments:

The first two arguments and the return value are floating-point numbers of thesame type.

The third and fourth arguments specify the rounding mode and exceptionbehavior as described above.

Semantics:

This function returns the quadrant-specific arctangent using the specifiedoperands, returning the same values as the libmatan2 functions would, andhandles error conditions in the same way.

llvm.experimental.constrained.sinh’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.sinh(<type><op1>,metadata<roundingmode>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.sinh’ intrinsic returns the hyperbolic sine of thefirst operand.

Arguments:

The first argument and the return type are floating-point numbers of the sametype.

The second and third arguments specify the rounding mode and exceptionbehavior as described above.

Semantics:

This function returns the hyperbolic sine of the specified operand, returning thesame values as the libmsinh functions would, and handles errorconditions in the same way.

llvm.experimental.constrained.cosh’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.cosh(<type><op1>,metadata<roundingmode>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.cosh’ intrinsic returns the hyperbolic cosine of thefirst operand.

Arguments:

The first argument and the return type are floating-point numbers of the sametype.

The second and third arguments specify the rounding mode and exceptionbehavior as described above.

Semantics:

This function returns the hyperbolic cosine of the specified operand, returning thesame values as the libmcosh functions would, and handles errorconditions in the same way.

llvm.experimental.constrained.tanh’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.tanh(<type><op1>,metadata<roundingmode>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.tanh’ intrinsic returns the hyperbolic tangent of thefirst operand.

Arguments:

The first argument and the return type are floating-point numbers of the sametype.

The second and third arguments specify the rounding mode and exceptionbehavior as described above.

Semantics:

This function returns the hyperbolic tangent of the specified operand, returning thesame values as the libmtanh functions would, and handles errorconditions in the same way.

llvm.experimental.constrained.exp’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.exp(<type><op1>,metadata<roundingmode>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.exp’ intrinsic computes the base-eexponential of the specified value.

Arguments:

The first argument and the return value are floating-point numbers of the sametype.

The second and third arguments specify the rounding mode and exceptionbehavior as described above.

Semantics:

This function returns the same values as the libmexp functionswould, and handles error conditions in the same way.

llvm.experimental.constrained.exp2’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.exp2(<type><op1>,metadata<roundingmode>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.exp2’ intrinsic computes the base-2exponential of the specified value.

Arguments:

The first argument and the return value are floating-point numbers of the sametype.

The second and third arguments specify the rounding mode and exceptionbehavior as described above.

Semantics:

This function returns the same values as the libmexp2 functionswould, and handles error conditions in the same way.

llvm.experimental.constrained.log’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.log(<type><op1>,metadata<roundingmode>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.log’ intrinsic computes the base-elogarithm of the specified value.

Arguments:

The first argument and the return value are floating-point numbers of the sametype.

The second and third arguments specify the rounding mode and exceptionbehavior as described above.

Semantics:

This function returns the same values as the libmlog functionswould, and handles error conditions in the same way.

llvm.experimental.constrained.log10’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.log10(<type><op1>,metadata<roundingmode>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.log10’ intrinsic computes the base-10logarithm of the specified value.

Arguments:

The first argument and the return value are floating-point numbers of the sametype.

The second and third arguments specify the rounding mode and exceptionbehavior as described above.

Semantics:

This function returns the same values as the libmlog10 functionswould, and handles error conditions in the same way.

llvm.experimental.constrained.log2’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.log2(<type><op1>,metadata<roundingmode>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.log2’ intrinsic computes the base-2logarithm of the specified value.

Arguments:

The first argument and the return value are floating-point numbers of the sametype.

The second and third arguments specify the rounding mode and exceptionbehavior as described above.

Semantics:

This function returns the same values as the libmlog2 functionswould, and handles error conditions in the same way.

llvm.experimental.constrained.rint’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.rint(<type><op1>,metadata<roundingmode>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.rint’ intrinsic returns the firstargument rounded to the nearest integer. It may raise an inexact floating-pointexception if the argument is not an integer.

Arguments:

The first argument and the return value are floating-point numbers of the sametype.

The second and third arguments specify the rounding mode and exceptionbehavior as described above.

Semantics:

This function returns the same values as the libmrint functionswould, and handles error conditions in the same way. The rounding mode isdescribed, not determined, by the rounding mode argument. The actual roundingmode is determined by the runtime floating-point environment. The roundingmode argument is only intended as information to the compiler.

llvm.experimental.constrained.lrint’ Intrinsic

Syntax:
declare<inttype>@llvm.experimental.constrained.lrint(<fptype><op1>,metadata<roundingmode>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.lrint’ intrinsic returns the firstargument rounded to the nearest integer. An inexact floating-point exceptionwill be raised if the argument is not an integer. If the rounded value is toolarge to fit into the result type, an invalid exception is raised, and thereturn value is a non-deterministic value (equivalent tofreeze poison).

Arguments:

The first argument is a floating-point number. The return value is aninteger type. Not all types are supported on all targets. The supportedtypes are the same as thellvm.lrint intrinsic and thelrintlibm functions.

The second and third arguments specify the rounding mode and exceptionbehavior as described above.

Semantics:

This function returns the same values as the libmlrint functionswould, and handles error conditions in the same way.

The rounding mode is described, not determined, by the rounding modeargument. The actual rounding mode is determined by the runtime floating-pointenvironment. The rounding mode argument is only intended as informationto the compiler.

If the runtime floating-point environment is using the default rounding modethen the results will be the same as the llvm.lrint intrinsic.

llvm.experimental.constrained.llrint’ Intrinsic

Syntax:
declare<inttype>@llvm.experimental.constrained.llrint(<fptype><op1>,metadata<roundingmode>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.llrint’ intrinsic returns the firstargument rounded to the nearest integer. An inexact floating-point exceptionwill be raised if the argument is not an integer. If the rounded value is toolarge to fit into the result type, an invalid exception is raised, and thereturn value is a non-deterministic value (equivalent tofreeze poison).

Arguments:

The first argument is a floating-point number. The return value is aninteger type. Not all types are supported on all targets. The supportedtypes are the same as thellvm.llrint intrinsic and thellrintlibm functions.

The second and third arguments specify the rounding mode and exceptionbehavior as described above.

Semantics:

This function returns the same values as the libmllrint functionswould, and handles error conditions in the same way.

The rounding mode is described, not determined, by the rounding modeargument. The actual rounding mode is determined by the runtime floating-pointenvironment. The rounding mode argument is only intended as informationto the compiler.

If the runtime floating-point environment is using the default rounding modethen the results will be the same as the llvm.llrint intrinsic.

llvm.experimental.constrained.nearbyint’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.nearbyint(<type><op1>,metadata<roundingmode>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.nearbyint’ intrinsic returns the firstargument rounded to the nearest integer. It will not raise an inexactfloating-point exception if the argument is not an integer.

Arguments:

The first argument and the return value are floating-point numbers of the sametype.

The second and third arguments specify the rounding mode and exceptionbehavior as described above.

Semantics:

This function returns the same values as the libmnearbyint functionswould, and handles error conditions in the same way. The rounding mode isdescribed, not determined, by the rounding mode argument. The actual roundingmode is determined by the runtime floating-point environment. The roundingmode argument is only intended as information to the compiler.

llvm.experimental.constrained.maxnum’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.maxnum(<type><op1>,<type><op2>metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.maxnum’ intrinsic returns the maximumof the two arguments.

Arguments:

The first two arguments and the return value are floating-point numbersof the same type.

The third argument specifies the exception behavior as described above.

Semantics:

This function follows the IEEE-754-2008 semantics for maxNum.

llvm.experimental.constrained.minnum’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.minnum(<type><op1>,<type><op2>metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.minnum’ intrinsic returns the minimumof the two arguments.

Arguments:

The first two arguments and the return value are floating-point numbersof the same type.

The third argument specifies the exception behavior as described above.

Semantics:

This function follows the IEEE-754-2008 semantics for minNum.

llvm.experimental.constrained.maximum’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.maximum(<type><op1>,<type><op2>metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.maximum’ intrinsic returns the maximumof the two arguments, propagating NaNs and treating -0.0 as less than +0.0.

Arguments:

The first two arguments and the return value are floating-point numbersof the same type.

The third argument specifies the exception behavior as described above.

Semantics:

This function follows semantics specified in the draft of IEEE 754-2019.

llvm.experimental.constrained.minimum’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.minimum(<type><op1>,<type><op2>metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.minimum’ intrinsic returns the minimumof the two arguments, propagating NaNs and treating -0.0 as less than +0.0.

Arguments:

The first two arguments and the return value are floating-point numbersof the same type.

The third argument specifies the exception behavior as described above.

Semantics:

This function follows semantics specified in the draft of IEEE 754-2019.

llvm.experimental.constrained.ceil’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.ceil(<type><op1>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.ceil’ intrinsic returns the ceiling of thefirst argument.

Arguments:

The first argument and the return value are floating-point numbers of the sametype.

The second argument specifies the exception behavior as described above.

Semantics:

This function returns the same values as the libmceil functionswould and handles error conditions in the same way.

llvm.experimental.constrained.floor’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.floor(<type><op1>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.floor’ intrinsic returns the floor of thefirst argument.

Arguments:

The first argument and the return value are floating-point numbers of the sametype.

The second argument specifies the exception behavior as described above.

Semantics:

This function returns the same values as the libmfloor functionswould and handles error conditions in the same way.

llvm.experimental.constrained.round’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.round(<type><op1>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.round’ intrinsic returns the firstargument rounded to the nearest integer.

Arguments:

The first argument and the return value are floating-point numbers of the sametype.

The second argument specifies the exception behavior as described above.

Semantics:

This function returns the same values as the libmround functionswould and handles error conditions in the same way.

llvm.experimental.constrained.roundeven’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.roundeven(<type><op1>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.roundeven’ intrinsic returns the firstargument rounded to the nearest integer in floating-point format, roundinghalfway cases to even (that is, to the nearest value that is an even integer),regardless of the current rounding direction.

Arguments:

The first argument and the return value are floating-point numbers of the sametype.

The second argument specifies the exception behavior as described above.

Semantics:

This function implements IEEE-754 operationroundToIntegralTiesToEven. Italso behaves in the same way as C standard functionroundeven and can signalthe invalid operation exception for a SNAN argument.

llvm.experimental.constrained.lround’ Intrinsic

Syntax:
declare<inttype>@llvm.experimental.constrained.lround(<fptype><op1>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.lround’ intrinsic returns the firstargument rounded to the nearest integer with ties away from zero. It willraise an inexact floating-point exception if the argument is not an integer.If the rounded value is too large to fit into the result type, an invalidexception is raised, and the return value is a non-deterministic value(equivalent tofreeze poison).

Arguments:

The first argument is a floating-point number. The return value is aninteger type. Not all types are supported on all targets. The supportedtypes are the same as thellvm.lround intrinsic and thelroundlibm functions.

The second argument specifies the exception behavior as described above.

Semantics:

This function returns the same values as the libmlround functionswould and handles error conditions in the same way.

llvm.experimental.constrained.llround’ Intrinsic

Syntax:
declare<inttype>@llvm.experimental.constrained.llround(<fptype><op1>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.llround’ intrinsic returns the firstargument rounded to the nearest integer with ties away from zero. It willraise an inexact floating-point exception if the argument is not an integer.If the rounded value is too large to fit into the result type, an invalidexception is raised, and the return value is a non-deterministic value(equivalent tofreeze poison).

Arguments:

The first argument is a floating-point number. The return value is aninteger type. Not all types are supported on all targets. The supportedtypes are the same as thellvm.llround intrinsic and thellroundlibm functions.

The second argument specifies the exception behavior as described above.

Semantics:

This function returns the same values as the libmllround functionswould and handles error conditions in the same way.

llvm.experimental.constrained.trunc’ Intrinsic

Syntax:
declare<type>@llvm.experimental.constrained.trunc(<type><op1>,metadata<exceptionbehavior>)
Overview:

The ‘llvm.experimental.constrained.trunc’ intrinsic returns the firstargument rounded to the nearest integer not larger in magnitude than theargument.

Arguments:

The first argument and the return value are floating-point numbers of the sametype.

The second argument specifies the exception behavior as described above.

Semantics:

This function returns the same values as the libmtrunc functionswould and handles error conditions in the same way.

llvm.experimental.noalias.scope.decl’ Intrinsic

Syntax:
declare void @llvm.experimental.noalias.scope.decl(metadata !id.scope.list)
Overview:

Thellvm.experimental.noalias.scope.decl intrinsic identifies where anoalias scope is declared. When the intrinsic is duplicated, a decision mustalso be made about the scope: depending on the reason of the duplication,the scope might need to be duplicated as well.

Arguments:

The!id.scope.list argument is metadata that is a list ofnoaliasmetadata references. The format is identical to that required fornoaliasmetadata. This list must have exactly one element.

Semantics:

Thellvm.experimental.noalias.scope.decl intrinsic identifies where anoalias scope is declared. When the intrinsic is duplicated, a decision mustalso be made about the scope: depending on the reason of the duplication,the scope might need to be duplicated as well.

For example, when the intrinsic is used inside a loop body, and that loop isunrolled, the associated noalias scope must also be duplicated. Otherwise, thenoalias property it signifies would spill across loop iterations, whereas itwas only valid within a single iteration.

; This examples shows two possible positions for noalias.decl and how they impact the semantics:; If it is outside the loop (Version 1), then %a and %b are noalias across *all* iterations.; If it is inside the loop (Version 2), then %a and %b are noalias only within *one* iteration.declarevoid@decl_in_loop(ptr%a.base,ptr%b.base){entry:; call void @llvm.experimental.noalias.scope.decl(metadata !2) ; Version 1: noalias decl outside loopbrlabel%looploop:%a=phiptr[%a.base,%entry],[%a.inc,%loop]%b=phiptr[%b.base,%entry],[%b.inc,%loop]; call void @llvm.experimental.noalias.scope.decl(metadata !2) ; Version 2: noalias decl inside loop%val=loadi8,ptr%a,!alias.scope!2storei8%val,ptr%b,!noalias!2%a.inc=getelementptrinboundsi8,ptr%a,i641%b.inc=getelementptrinboundsi8,ptr%b,i641%cond=calli1@cond()bri1%cond,label%loop,label%exitexit:retvoid}!0=!{!0}; domain!1=!{!1,!0}; scope!2=!{!1}; scope list

Multiple calls to@llvm.experimental.noalias.scope.decl for the same scopeare possible, but one should never dominate another. Violations are pointed outby the verifier as they indicate a problem in either a transformation pass orthe input.

Floating Point Environment Manipulation intrinsics

These functions read or write floating point environment, such as roundingmode or state of floating point exceptions. Altering the floating pointenvironment requires special care. SeeFloating Point Environment.

llvm.get.rounding’ Intrinsic

Syntax:
declarei32@llvm.get.rounding()
Overview:

The ‘llvm.get.rounding’ intrinsic reads the current rounding mode.

Semantics:

The ‘llvm.get.rounding’ intrinsic returns the current rounding mode.Encoding of the returned values is same as the result ofFLT_ROUNDS,specified by C standard:

0-towardzero1-tonearest,tiestoeven2-towardpositiveinfinity3-towardnegativeinfinity4-tonearest,tiesawayfromzero

Other values may be used to represent additional rounding modes, supported by atarget. These values are target-specific.

llvm.set.rounding’ Intrinsic

Syntax:
declarevoid@llvm.set.rounding(i32<val>)
Overview:

The ‘llvm.set.rounding’ intrinsic sets current rounding mode.

Arguments:

The argument is the required rounding mode. Encoding of rounding mode isthe same as used by ‘llvm.get.rounding’.

Semantics:

The ‘llvm.set.rounding’ intrinsic sets the current rounding mode. It issimilar to C library function ‘fesetround’, however this intrinsic does notreturn any value and uses platform-independent representation of IEEE roundingmodes.

llvm.get.fpenv’ Intrinsic

Syntax:
declare<integer_type>@llvm.get.fpenv()
Overview:

The ‘llvm.get.fpenv’ intrinsic returns bits of the current floating-pointenvironment. The return value type is platform-specific.

Semantics:

The ‘llvm.get.fpenv’ intrinsic reads the current floating-point environmentand returns it as an integer value.

llvm.set.fpenv’ Intrinsic

Syntax:
declarevoid@llvm.set.fpenv(<integer_type><val>)
Overview:

The ‘llvm.set.fpenv’ intrinsic sets the current floating-point environment.

Arguments:

The argument is an integer representing the new floating-point environment. Theinteger type is platform-specific.

Semantics:

The ‘llvm.set.fpenv’ intrinsic sets the current floating-point environmentto the state specified by the argument. The state may be previously obtained by acall to ‘llvm.get.fpenv’ or synthesized in a platform-dependent way.

llvm.reset.fpenv’ Intrinsic

Syntax:
declarevoid@llvm.reset.fpenv()
Overview:

The ‘llvm.reset.fpenv’ intrinsic sets the default floating-point environment.

Semantics:

The ‘llvm.reset.fpenv’ intrinsic sets the current floating-point environmentto default state. It is similar to the call ‘fesetenv(FE_DFL_ENV)’, except itdoes not return any value.

llvm.get.fpmode’ Intrinsic

Syntax:

The ‘llvm.get.fpmode’ intrinsic returns bits of the current floating-pointcontrol modes. The return value type is platform-specific.

declare<integer_type>@llvm.get.fpmode()
Overview:

The ‘llvm.get.fpmode’ intrinsic reads the current dynamic floating-pointcontrol modes and returns it as an integer value.

Arguments:

None.

Semantics:

The ‘llvm.get.fpmode’ intrinsic reads the current dynamic floating-pointcontrol modes, such as rounding direction, precision, treatment of denormals andso on. It is similar to the C library function ‘fegetmode’, however thisfunction does not store the set of control modes into memory but returns it asan integer value. Interpretation of the bits in this value is target-dependent.

llvm.set.fpmode’ Intrinsic

Syntax:

The ‘llvm.set.fpmode’ intrinsic sets the current floating-point control modes.

declarevoid@llvm.set.fpmode(<integer_type><val>)
Overview:

The ‘llvm.set.fpmode’ intrinsic sets the current dynamic floating-pointcontrol modes.

Arguments:

The argument is a set of floating-point control modes, represented as an integervalue in a target-dependent way.

Semantics:

The ‘llvm.set.fpmode’ intrinsic sets the current dynamic floating-pointcontrol modes to the state specified by the argument, which must be obtained bya call to ‘llvm.get.fpmode’ or constructed in a target-specific way. It issimilar to the C library function ‘fesetmode’, however this function does notread the set of control modes from memory but gets it as integer value.

llvm.reset.fpmode’ Intrinsic

Syntax:
declarevoid@llvm.reset.fpmode()
Overview:

The ‘llvm.reset.fpmode’ intrinsic sets the default dynamic floating-pointcontrol modes.

Arguments:

None.

Semantics:

The ‘llvm.reset.fpmode’ intrinsic sets the current dynamic floating-pointenvironment to default state. It is similar to the C library function call‘fesetmode(FE_DFL_MODE)’, however this function does not return any value.

Floating-Point Test Intrinsics

These functions get properties of floating-point values.

llvm.is.fpclass’ Intrinsic

Syntax:
declarei1@llvm.is.fpclass(<fptype><op>,i32<test>)declare<Nxi1>@llvm.is.fpclass(<vector-fptype><op>,i32<test>)
Overview:

The ‘llvm.is.fpclass’ intrinsic returns a boolean value or vector of booleanvalues depending on whether the first argument satisfies the test specified bythe second argument.

If the first argument is a floating-point scalar, then the result type is aboolean (i1).

If the first argument is a floating-point vector, then the result type is avector of boolean with the same number of elements as the first argument.

Arguments:

The first argument to the ‘llvm.is.fpclass’ intrinsic must befloating-point orvectorof floating-point values.

The second argument specifies, which tests to perform. It must be a compile-timeinteger constant, each bit in which specifies floating-point class:

Bit #

floating-point class

0

Signaling NaN

1

Quiet NaN

2

Negative infinity

3

Negative normal

4

Negative subnormal

5

Negative zero

6

Positive zero

7

Positive subnormal

8

Positive normal

9

Positive infinity

Semantics:

The function checks ifop belongs to any of the floating-point classesspecified bytest. Ifop is a vector, then the check is made element byelement. Each check yields ani1 result, which istrue,if the element value satisfies the specified test. The argumenttest is abit mask where each bit specifies floating-point class to test. For example, thevalue 0x108 makes test for normal value, - bits 3 and 8 in it are set, whichmeans that the function returnstrue ifop is a positive or negativenormal value. The function never raises floating-point exceptions. Thefunction does not canonicalize its input value and does not dependon the floating-point environment. If the floating-point environmenthas a zeroing treatment of subnormal input values (such as indicatedby the"denormal-fp-math" attribute), a subnormal value will beobserved (will not be implicitly treated as zero).

General Intrinsics

This class of intrinsics is designed to be generic and has no specificpurpose.

llvm.var.annotation’ Intrinsic

Syntax:
declarevoid@llvm.var.annotation(ptr<val>,ptr<str>,ptr<str>,i32<int>)
Overview:

The ‘llvm.var.annotation’ intrinsic.

Arguments:

The first argument is a pointer to a value, the second is a pointer to aglobal string, the third is a pointer to a global string which is thesource file name, and the last argument is the line number.

Semantics:

This intrinsic allows annotation of local variables with arbitrarystrings. This can be useful for special purpose optimizations that wantto look for these annotations. These have no other defined use; they areignored by code generation and optimization.

llvm.ptr.annotation.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can use ‘llvm.ptr.annotation’ on apointer to an integer of any width.NOTE you must specify an address space forthe pointer. The identifier for the default address space is the integer‘0’.

declareptr@llvm.ptr.annotation.p0(ptr<val>,ptr<str>,ptr<str>,i32<int>)declareptr@llvm.ptr.annotation.p1(ptraddrspace(1)<val>,ptr<str>,ptr<str>,i32<int>)
Overview:

The ‘llvm.ptr.annotation’ intrinsic.

Arguments:

The first argument is a pointer to an integer value of arbitrary bitwidth(result of some expression), the second is a pointer to a global string, thethird is a pointer to a global string which is the source file name, and thelast argument is the line number. It returns the value of the first argument.

Semantics:

This intrinsic allows annotation of a pointer to an integer with arbitrarystrings. This can be useful for special purpose optimizations that want to lookfor these annotations. These have no other defined use; transformations preserveannotations on a best-effort basis but are allowed to replace the intrinsic withits first argument without breaking semantics and the intrinsic is completelydropped during instruction selection.

llvm.annotation.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can use ‘llvm.annotation’ onany integer bit width.

declarei8@llvm.annotation.i8(i8<val>,ptr<str>,ptr<str>,i32<int>)declarei16@llvm.annotation.i16(i16<val>,ptr<str>,ptr<str>,i32<int>)declarei32@llvm.annotation.i32(i32<val>,ptr<str>,ptr<str>,i32<int>)declarei64@llvm.annotation.i64(i64<val>,ptr<str>,ptr<str>,i32<int>)declarei256@llvm.annotation.i256(i256<val>,ptr<str>,ptr<str>,i32<int>)
Overview:

The ‘llvm.annotation’ intrinsic.

Arguments:

The first argument is an integer value (result of some expression), thesecond is a pointer to a global string, the third is a pointer to aglobal string which is the source file name, and the last argument isthe line number. It returns the value of the first argument.

Semantics:

This intrinsic allows annotations to be put on arbitrary expressions witharbitrary strings. This can be useful for special purpose optimizations thatwant to look for these annotations. These have no other defined use;transformations preserve annotations on a best-effort basis but are allowed toreplace the intrinsic with its first argument without breaking semantics and theintrinsic is completely dropped during instruction selection.

llvm.codeview.annotation’ Intrinsic

Syntax:

This annotation emits a label at its program point and an associatedS_ANNOTATION codeview record with some additional string metadata. This isused to implement MSVC’s__annotation intrinsic. It is markednoduplicate, so calls to this intrinsic prevent inlining and should beconsidered expensive.

declarevoid@llvm.codeview.annotation(metadata)
Arguments:

The argument should be an MDTuple containing any number of MDStrings.

llvm.trap’ Intrinsic

Syntax:
declarevoid@llvm.trap()coldnoreturnnounwind
Overview:

The ‘llvm.trap’ intrinsic.

Arguments:

None.

Semantics:

This intrinsic is lowered to the target dependent trap instruction. Ifthe target does not have a trap instruction, this intrinsic will belowered to a call of theabort() function.

llvm.debugtrap’ Intrinsic

Syntax:
declarevoid@llvm.debugtrap()nounwind
Overview:

The ‘llvm.debugtrap’ intrinsic.

Arguments:

None.

Semantics:

This intrinsic is lowered to code which is intended to cause anexecution trap with the intention of requesting the attention of adebugger.

llvm.ubsantrap’ Intrinsic

Syntax:
declarevoid@llvm.ubsantrap(i8immarg)coldnoreturnnounwind
Overview:

The ‘llvm.ubsantrap’ intrinsic.

Arguments:

An integer describing the kind of failure detected.

Semantics:

This intrinsic is lowered to code which is intended to cause an execution trap,embedding the argument into encoding of that trap somehow to discriminatecrashes if possible.

Equivalent to@llvm.trap for targets that do not support this behavior.

llvm.stackprotector’ Intrinsic

Syntax:
declarevoid@llvm.stackprotector(ptr<guard>,ptr<slot>)
Overview:

Thellvm.stackprotector intrinsic takes theguard and stores itonto the stack atslot. The stack slot is adjusted to ensure that itis placed on the stack before local variables.

Arguments:

Thellvm.stackprotector intrinsic requires two pointer arguments.The first argument is the value loaded from the stack guard@__stack_chk_guard. The second variable is analloca that hasenough space to hold the value of the guard.

Semantics:

This intrinsic causes the prologue/epilogue inserter to force the position oftheAllocaInst stack slot to be before local variables on the stack. This isto ensure that if a local variable on the stack is overwritten, it will destroythe value of the guard. When the function exits, the guard on the stack ischecked against the original guard byllvm.stackprotectorcheck. If they aredifferent, thenllvm.stackprotectorcheck causes the program to abort bycalling the__stack_chk_fail() function.

llvm.stackguard’ Intrinsic

Syntax:
declareptr@llvm.stackguard()
Overview:

Thellvm.stackguard intrinsic returns the system stack guard value.

It should not be generated by frontends, since it is only for internal usage.The reason why we create this intrinsic is that we still support IR form StackProtector in FastISel.

Arguments:

None.

Semantics:

On some platforms, the value returned by this intrinsic remains unchangedbetween loads in the same thread. On other platforms, it returns the sameglobal variable value, if any, e.g.@__stack_chk_guard.

Currently some platforms have IR-level customized stack guard loading (e.g.X86 Linux) that is not handled byllvm.stackguard(), while they should bein the future.

llvm.objectsize’ Intrinsic

Syntax:
declarei32@llvm.objectsize.i32(ptr<object>,i1<min>,i1<nullunknown>,i1<dynamic>)declarei64@llvm.objectsize.i64(ptr<object>,i1<min>,i1<nullunknown>,i1<dynamic>)
Overview:

Thellvm.objectsize intrinsic is designed to provide information to theoptimizer to determine whether a) an operation (like memcpy) will overflow abuffer that corresponds to an object, or b) that a runtime check for overflowisn’t necessary. An object in this context means an allocation of a specificclass, structure, array, or other object.

Arguments:

Thellvm.objectsize intrinsic takes four arguments. The first argument is apointer to or into theobject. The second argument determines whetherllvm.objectsize returns 0 (if true) or -1 (if false) when the object size isunknown. The third argument controls howllvm.objectsize acts whennullin address space 0 is used as its pointer argument. If it’sfalse,llvm.objectsize reports 0 bytes available when givennull. Otherwise, ifthenull is in a non-zero address space or iftrue is given for thethird argument ofllvm.objectsize, we assume its size is unknown. The fourthargument tollvm.objectsize determines if the value should be evaluated atruntime.

The second, third, and fourth arguments only accept constants.

Semantics:

Thellvm.objectsize intrinsic is lowered to a value representing the size ofthe object concerned. If the size cannot be determined,llvm.objectsizereturnsi32/i64-1or0 (depending on themin argument).

llvm.expect’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.expect on anyinteger bit width.

declarei1@llvm.expect.i1(i1<val>,i1<expected_val>)declarei32@llvm.expect.i32(i32<val>,i32<expected_val>)declarei64@llvm.expect.i64(i64<val>,i64<expected_val>)
Overview:

Thellvm.expect intrinsic provides information about expected (themost probable) value ofval, which can be used by optimizers.

Arguments:

Thellvm.expect intrinsic takes two arguments. The first argument isa value. The second argument is an expected value.

Semantics:

This intrinsic is lowered to theval.

llvm.expect.with.probability’ Intrinsic

Syntax:

This intrinsic is similar tollvm.expect. This is an overloaded intrinsic.You can usellvm.expect.with.probability on any integer bit width.

declarei1@llvm.expect.with.probability.i1(i1<val>,i1<expected_val>,double<prob>)declarei32@llvm.expect.with.probability.i32(i32<val>,i32<expected_val>,double<prob>)declarei64@llvm.expect.with.probability.i64(i64<val>,i64<expected_val>,double<prob>)
Overview:

Thellvm.expect.with.probability intrinsic provides information aboutexpected value ofval with probability(or confidence)prob, which canbe used by optimizers.

Arguments:

Thellvm.expect.with.probability intrinsic takes three arguments. The firstargument is a value. The second argument is an expected value. The thirdargument is a probability.

Semantics:

This intrinsic is lowered to theval.

llvm.assume’ Intrinsic

Syntax:
declarevoid@llvm.assume(i1%cond)
Overview:

Thellvm.assume allows the optimizer to assume that the providedcondition is true. This information can then be used in simplifying other partsof the code.

More complex assumptions can be encoded asassume operand bundles.

Arguments:

The argument of the call is the condition which the optimizer may assume isalways true.

Semantics:

The intrinsic allows the optimizer to assume that the provided condition isalways true whenever the control flow reaches the intrinsic call. No code isgenerated for this intrinsic, and instructions that contribute only to theprovided condition are not used for code generation. If the condition isviolated during execution, the behavior is undefined.

Note that the optimizer might limit the transformations performed on valuesused by thellvm.assume intrinsic in order to preserve the instructionsonly used to form the intrinsic’s input argument. This might prove undesirableif the extra information provided by thellvm.assume intrinsic does not causesufficient overall improvement in code quality. For this reason,llvm.assume should not be used to document basic mathematical invariantsthat the optimizer can otherwise deduce or facts that are of little use to theoptimizer.

llvm.ssa.copy’ Intrinsic

Syntax:
declaretype@llvm.ssa.copy(typereturned%operand)memory(none)
Arguments:

The first argument is an operand which is used as the returned value.

Overview:

Thellvm.ssa.copy intrinsic can be used to attach information tooperations by copying them and giving them new names. For example,the PredicateInfo utility uses it to build Extended SSA form, andattach various forms of information to operands that dominate specificuses. It is not meant for general use, only for building temporaryrenaming forms that require value splits at certain points.

llvm.type.test’ Intrinsic

Syntax:
declarei1@llvm.type.test(ptr%ptr,metadata%type)nounwindmemory(none)
Arguments:

The first argument is a pointer to be tested. The second argument is ametadata object representing atype identifier.

Overview:

Thellvm.type.test intrinsic tests whether the given pointer is associatedwith the given type identifier.

llvm.type.checked.load’ Intrinsic

Syntax:
declare{ptr,i1}@llvm.type.checked.load(ptr%ptr,i32%offset,metadata%type)nounwindmemory(argmem:read)
Arguments:

The first argument is a pointer from which to load a function pointer. Thesecond argument is the byte offset from which to load the function pointer. Thethird argument is a metadata object representing atype identifier.

Overview:

Thellvm.type.checked.load intrinsic safely loads a function pointer from avirtual table pointer using type metadata. This intrinsic is used to implementcontrol flow integrity in conjunction with virtual call optimization. Thevirtual call optimization pass will optimize awayllvm.type.checked.loadintrinsics associated with devirtualized calls, thereby removing the typecheck in cases where it is not needed to enforce the control flow integrityconstraint.

If the given pointer is associated with a type metadata identifier, thisfunction returns true as the second element of its return value. (Note thatthe function may also return true if the given pointer is not associatedwith a type metadata identifier.) If the function’s return value’s secondelement is true, the following rules apply to the first element:

  • If the given pointer is associated with the given type metadata identifier,it is the function pointer loaded from the given byte offset from the givenpointer.

  • If the given pointer is not associated with the given type metadataidentifier, it is one of the following (the choice of which is unspecified):

    1. The function pointer that would have been loaded from an arbitrarily chosen(through an unspecified mechanism) pointer associated with the typemetadata.

    2. If the function has a non-void return type, a pointer to a function thatreturns an unspecified value without causing side effects.

If the function’s return value’s second element is false, the value of thefirst element is undefined.

llvm.type.checked.load.relative’ Intrinsic

Syntax:
declare{ptr,i1}@llvm.type.checked.load.relative(ptr%ptr,i32%offset,metadata%type)nounwindmemory(argmem:read)
Overview:

Thellvm.type.checked.load.relative intrinsic loads a relative pointer to afunction from a virtual table pointer using metadata. Otherwise, its semantic isidentical to thellvm.type.checked.load intrinsic.

A relative pointer is a pointer to an offset. This is the offset between the destinationpointer and the original pointer. The address of the destination pointer is obtainedby loading this offset and adding it to the original pointer. This calculation is thesame as that of thellvm.load.relative intrinsic.

llvm.arithmetic.fence’ Intrinsic

Syntax:
declare<type>@llvm.arithmetic.fence(<type><op>)
Overview:

The purpose of thellvm.arithmetic.fence intrinsicis to prevent the optimizer from performing fast-math optimizations,particularly reassociation,between the argument and the expression that contains the argument.It can be used to preserve the parentheses in the source language.

Arguments:

Thellvm.arithmetic.fence intrinsic takes only one argument.The argument and the return value are floating-point numbers,or vector floating-point numbers, of the same type.

Semantics:

This intrinsic returns the value of its operand. The optimizer can optimizethe argument, but the optimizer cannot hoist any component of the operandto the containing context, and the optimizer cannot move the calculation ofany expression in the containing context into the operand.

llvm.donothing’ Intrinsic

Syntax:
declarevoid@llvm.donothing()nounwindmemory(none)
Overview:

Thellvm.donothing intrinsic doesn’t perform any operation. It’s one of onlythree intrinsics (besidesllvm.experimental.patchpoint andllvm.experimental.gc.statepoint) that can be called with an invokeinstruction.

Arguments:

None.

Semantics:

This intrinsic does nothing, and it’s removed by optimizers and ignoredby codegen.

llvm.experimental.deoptimize’ Intrinsic

Syntax:
declaretype@llvm.experimental.deoptimize(...)["deopt"(...)]
Overview:

This intrinsic, together withdeoptimization operand bundles, allow frontends to express transfer of control andframe-local state from the currently executing (typically more specialized,hence faster) version of a function into another (typically more generic, henceslower) version.

In languages with a fully integrated managed runtime like Java and JavaScriptthis intrinsic can be used to implement “uncommon trap” or “side exit” likefunctionality. In unmanaged languages like C and C++, this intrinsic can beused to represent the slow paths of specialized functions.

Arguments:

The intrinsic takes an arbitrary number of arguments, whose meaning isdecided by thelowering strategy.

Semantics:

The@llvm.experimental.deoptimize intrinsic executes an attacheddeoptimization continuation (denoted using adeoptimizationoperand bundle) and returns the value returned bythe deoptimization continuation. Defining the semantic properties ofthe continuation itself is out of scope of the language reference –as far as LLVM is concerned, the deoptimization continuation caninvoke arbitrary side effects, including reading from and writing tothe entire heap.

Deoptimization continuations expressed using"deopt" operand bundles alwayscontinue execution to the end of the physical frame containing them, so allcalls to@llvm.experimental.deoptimize must be in “tail position”:

  • @llvm.experimental.deoptimize cannot be invoked.

  • The call must immediately precede aret instruction.

  • Theret instruction must return the value produced by the@llvm.experimental.deoptimize call if there is one, or void.

Note that the above restrictions imply that the return type for a call to@llvm.experimental.deoptimize will match the return type of its immediatecaller.

The inliner composes the"deopt" continuations of the caller into the"deopt" continuations present in the inlinee, and also updates calls to thisintrinsic to return directly from the frame of the function it inlined into.

All declarations of@llvm.experimental.deoptimize must share thesame calling convention.

Lowering:

Calls to@llvm.experimental.deoptimize are lowered to calls to thesymbol__llvm_deoptimize (it is the frontend’s responsibility toensure that this symbol is defined). The call arguments to@llvm.experimental.deoptimize are lowered as if they were formalarguments of the specified types, and not as varargs.

llvm.experimental.guard’ Intrinsic

Syntax:
declarevoid@llvm.experimental.guard(i1,...)["deopt"(...)]
Overview:

This intrinsic, together withdeoptimization operand bundles, allows frontends to express guards or checks onoptimistic assumptions made during compilation. The semantics of@llvm.experimental.guard is defined in terms of@llvm.experimental.deoptimize – its body is defined to beequivalent to:

define void @llvm.experimental.guard(i1 %pred, <args...>) {  %realPred = and i1 %pred, undef  br i1 %realPred, label %continue, label %leave [, !make.implicit !{}]leave:  call void @llvm.experimental.deoptimize(<args...>) [ "deopt"() ]  ret voidcontinue:  ret void}

with the optional[,!make.implicit!{}] present if and only if itis present on the call site. For more details on!make.implicit,seeFaultMaps and implicit checks.

In words,@llvm.experimental.guard executes the attached"deopt" continuation if (butnot only if) its first argumentisfalse. Since the optimizer is allowed to replace theundefwith an arbitrary value, it can optimize guard to fail “spuriously”,i.e. without the original condition being false (hence the “not onlyif”); and this allows for “check widening” type optimizations.

@llvm.experimental.guard cannot be invoked.

After@llvm.experimental.guard was first added, a more generalformulation was found in@llvm.experimental.widenable.condition.Support for@llvm.experimental.guard is slowly being rephrased interms of this alternate.

llvm.experimental.widenable.condition’ Intrinsic

Syntax:
declarei1@llvm.experimental.widenable.condition()
Overview:

This intrinsic represents a “widenable condition” which isboolean expressions with the following property: whether thisexpression istrue orfalse, the program is correct andwell-defined.

Together withdeoptimization operand bundles,@llvm.experimental.widenable.condition allows frontends toexpress guards or checks on optimistic assumptions made duringcompilation and represent them as branch instructions on specialconditions.

While this may appear similar in semantics toundef, it is verydifferent in that an invocation produces a particular, singularvalue. It is also intended to be lowered late, and remain availablefor specific optimizations and transforms that can benefit from itsspecial properties.

Arguments:

None.

Semantics:

The intrinsic@llvm.experimental.widenable.condition()returns eithertrue orfalse. For each evaluation of a callto this intrinsic, the program must be valid and correct both ifit returnstrue and if it returnsfalse. This allowstransformation passes to replace evaluations of this intrinsicwith either value whenever one is beneficial.

When used in a branch condition, it allows us to choose betweentwo alternative correct solutions for the same problem, likein example below:

  %cond = call i1 @llvm.experimental.widenable.condition()  br i1 %cond, label %fast_path, label %slow_pathfast_path:  ; Apply memory-consuming but fast solution for a task.slow_path:  ; Cheap in memory but slow solution.

Whether the result of intrinsic’s call istrue orfalse,it should be correct to pick either solution. We can switchbetween them by replacing the result of@llvm.experimental.widenable.condition with differenti1 expressions.

This is how it can be used to represent guards as widenable branches:

block:  ; Unguarded instructions  call void @llvm.experimental.guard(i1 %cond, <args...>) ["deopt"(<deopt_args...>)]  ; Guarded instructions

Can be expressed in an alternative equivalent form of explicit branch using@llvm.experimental.widenable.condition:

block:  ; Unguarded instructions  %widenable_condition = call i1 @llvm.experimental.widenable.condition()  %guard_condition = and i1 %cond, %widenable_condition  br i1 %guard_condition, label %guarded, label %deoptguarded:  ; Guarded instructionsdeopt:  call type @llvm.experimental.deoptimize(<args...>) [ "deopt"(<deopt_args...>) ]

So the blockguarded is only reachable when%cond istrue,and it should be valid to go to the blockdeopt whenever%condistrue orfalse.

@llvm.experimental.widenable.condition will never throw, thusit cannot be invoked.

Guard widening:

When@llvm.experimental.widenable.condition() is used incondition of a guard represented as explicit branch, it islegal to widen the guard’s condition with any additionalconditions.

Guard widening looks like replacement of

%widenable_cond = call i1 @llvm.experimental.widenable.condition()%guard_cond = and i1 %cond, %widenable_condbr i1 %guard_cond, label %guarded, label %deopt

with

%widenable_cond = call i1 @llvm.experimental.widenable.condition()%new_cond = and i1 %any_other_cond, %widenable_cond%new_guard_cond = and i1 %cond, %new_condbr i1 %new_guard_cond, label %guarded, label %deopt

for this branch. Here%any_other_cond is an arbitrarily chosenwell-definedi1 value. By making guard widening, we mayimpose stricter conditions onguarded block and bail to thedeopt when the new condition is not met.

Lowering:

Default lowering strategy is replacing the result ofcall of@llvm.experimental.widenable.condition withconstanttrue. However it is always correct to replaceit with any otheri1 value. Any pass canfreely do it if it can benefit from non-default lowering.

llvm.allow.ubsan.check’ Intrinsic

Syntax:
declarei1@llvm.allow.ubsan.check(i8immarg%kind)
Overview:

This intrinsic returnstrue if and only if the compiler opted to enable theubsan check in the current basic block.

Rules to allow ubsan checks are not part of the intrinsic declaration, andcontrolled by compiler options.

This intrinsic is the ubsan specific version of@llvm.allow.runtime.check().

Arguments:

An integer describing the kind of ubsan check guarded by the intrinsic.

Semantics:

The intrinsic@llvm.allow.ubsan.check() returns eithertrue orfalse, depending on compiler options.

For each evaluation of a call to this intrinsic, the program must be valid andcorrect both if it returnstrue and if it returnsfalse.

When used in a branch condition, it selects one of the two paths:

  • true`: Executes the UBSan check and reports any failures.

  • false: Bypasses the check, assuming it always succeeds.

Example:

  %allow = call i1 @llvm.allow.ubsan.check(i8 5)  %not.allow = xor i1 %allow, true  %cond = or i1 %ubcheck, %not.allow  br i1 %cond, label %cont, label %trapcont:  ; Proceedtrap:  call void @llvm.ubsantrap(i8 5)  unreachable

llvm.allow.runtime.check’ Intrinsic

Syntax:
declarei1@llvm.allow.runtime.check(metadata%kind)
Overview:

This intrinsic returnstrue if and only if the compiler opted to enableruntime checks in the current basic block.

Rules to allow runtime checks are not part of the intrinsic declaration, andcontrolled by compiler options.

This intrinsic is non-ubsan specific version of@llvm.allow.ubsan.check().

Arguments:

A string identifying the kind of runtime check guarded by the intrinsic. Thestring can be used to control rules to allow checks.

Semantics:

The intrinsic@llvm.allow.runtime.check() returns eithertrue orfalse, depending on compiler options.

For each evaluation of a call to this intrinsic, the program must be valid andcorrect both if it returnstrue and if it returnsfalse.

When used in a branch condition, it allows us to choose betweentwo alternative correct solutions for the same problem.

If the intrinsic is evaluated astrue, program should execute a guardedcheck. If the intrinsic is evaluated asfalse, the program should avoid anyunnecessary checks.

Example:

  %allow = call i1 @llvm.allow.runtime.check(metadata !"my_check")  br i1 %allow, label %fast_path, label %slow_pathfast_path:  ; Omit diagnostics.slow_path:  ; Additional diagnostics.

llvm.load.relative’ Intrinsic

Syntax:
declareptr@llvm.load.relative.iN(ptr%ptr,iN%offset)nounwindmemory(argmem:read)
Overview:

This intrinsic loads a 32-bit value from the address%ptr+%offset,adds%ptr to that value and returns it. The constant folder specificallyrecognizes the form of this intrinsic and the constant initializers it mayload from; if a loaded constant initializer is known to have the formi32trunc(x-%ptr), the intrinsic call is folded tox.

LLVM provides that the calculation of such a constant initializer willnot overflow at link time under the medium code model ifx is anunnamed_addr function. However, it does not provide this guarantee fora constant initializer folded into a function body. This intrinsic can beused to avoid the possibility of overflows when loading from such a constant.

llvm.sideeffect’ Intrinsic

Syntax:
declarevoid@llvm.sideeffect()inaccessiblememonlynounwindwillreturn
Overview:

Thellvm.sideeffect intrinsic doesn’t perform any operation. Optimizerstreat it as having side effects, so it can be inserted into a loop toindicate that the loop shouldn’t be assumed to terminate (which couldpotentially lead to the loop being optimized away entirely), even if it’san infinite loop with no other side effects.

Arguments:

None.

Semantics:

This intrinsic actually does nothing, but optimizers must assume that ithas externally observable side effects.

llvm.is.constant.*’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can use llvm.is.constant with any argument type.

declarei1@llvm.is.constant.i32(i32%operand)nounwindmemory(none)declarei1@llvm.is.constant.f32(float%operand)nounwindmemory(none)declarei1@llvm.is.constant.TYPENAME(TYPE%operand)nounwindmemory(none)
Overview:

The ‘llvm.is.constant’ intrinsic will return true if the argumentis known to be a manifest compile-time constant. It is guaranteed tofold to either true or false before generating machine code.

Semantics:

This intrinsic generates no code. If its argument is known to be amanifest compile-time constant value, then the intrinsic will beconverted to a constant true value. Otherwise, it will be converted toa constant false value.

In particular, note that if the argument is a constant expressionwhich refers to a global (the address of which _is_ a constant, butnot manifest during the compile), then the intrinsic evaluates tofalse.

The result also intentionally depends on the result of optimizationpasses – e.g., the result can change depending on whether afunction gets inlined or not. A function’s parameters areobviously not constant. However, a call likellvm.is.constant.i32(i32%param)can return true after thefunction is inlined, if the value passed to the function parameter wasa constant.

llvm.ptrmask’ Intrinsic

Syntax:
declareptrtyllvm.ptrmask(ptrty%ptr,intty%mask)speculatablememory(none)
Arguments:

The first argument is a pointer or vector of pointers. The second argument isan integer or vector of integers with the same bit width as the index typesize of the first argument.

Overview:

Thellvm.ptrmask intrinsic masks out bits of the pointer according to a mask.This allows stripping data from tagged pointers without converting them to aninteger (ptrtoint/inttoptr). As a consequence, we can preserve more informationto facilitate alias analysis and underlying-object detection.

Semantics:

The result ofptrmask(%ptr,%mask) is equivalent to the following expansion,whereiPtrIdx is the index type size of the pointer:

%intptr=ptrtointptr%ptrtoiPtrIdx;thismaytruncate%masked=andiPtrIdx%intptr,%mask%diff=subiPtrIdx%masked,%intptr%result=getelementptri8,ptr%ptr,iPtrIdx%diff

If the pointer index type size is smaller than the pointer type size, thisimplies that pointer bits beyond the index size are not affected by thisintrinsic. For integral pointers, it behaves as if the mask were extended with1 bits to the pointer type size.

Both the returned pointer(s) and the first argument are based on the sameunderlying object (for more information on thebased on terminology seethe pointer aliasing rules).

The intrinsic only captures the pointer argument through the return value.

llvm.threadlocal.address’ Intrinsic

Syntax:
declareptr@llvm.threadlocal.address(ptr)nounwindwillreturnmemory(none)
Arguments:

Thellvm.threadlocal.address intrinsic requires a global value argument (aglobal variable or alias) that is thread local.

Semantics:

The address of a thread local global is not a constant, since it depends onthe calling thread. Thellvm.threadlocal.address intrinsic returns theaddress of the given thread local global in the calling thread.

llvm.vscale’ Intrinsic

Syntax:
declarei32llvm.vscale.i32()declarei64llvm.vscale.i64()
Overview:

Thellvm.vscale intrinsic returns the value forvscale in scalablevectors such as<vscalex16xi8>.

Semantics:

vscale is a positive value that is constant throughout programexecution, but is unknown at compile time.If the result value does not fit in the result type, then the result isapoison value.

llvm.fake.use’ Intrinsic

Syntax:
declarevoid@llvm.fake.use(...)
Overview:

Thellvm.fake.use intrinsic is a no-op. It takes a singlevalue as an operand and is treated as a use of that operand, to force theoptimizer to preserve that value prior to the fake use. This is used forextending the lifetimes of variables, where this intrinsic placed at the end ofa variable’s scope helps prevent that variable from being optimized out.

Arguments:

Thellvm.fake.use intrinsic takes one argument, which may be anyfunction-local SSA value. Note that the signature is variadic so that theintrinsic can take any type of argument, but passing more than one argument willresult in an error.

Semantics:

This intrinsic does nothing, but optimizers must consider it a use of its singleoperand and should try to preserve the intrinsic and its position in thefunction.

Stack Map Intrinsics

LLVM provides experimental intrinsics to support runtime patchingmechanisms commonly desired in dynamic language JITs. These intrinsicsare described inStack maps and patch points in LLVM.

Element Wise Atomic Memory Intrinsics

These intrinsics are similar to the standard library memory intrinsics exceptthat they perform memory transfer as a sequence of atomic memory accesses.

llvm.memcpy.element.unordered.atomic’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.memcpy.element.unordered.atomic onany integer bit width and for different address spaces. Not all targetssupport all bit widths however.

declarevoid@llvm.memcpy.element.unordered.atomic.p0.p0.i32(ptr<dest>,ptr<src>,i32<len>,i32<element_size>)declarevoid@llvm.memcpy.element.unordered.atomic.p0.p0.i64(ptr<dest>,ptr<src>,i64<len>,i32<element_size>)
Overview:

The ‘llvm.memcpy.element.unordered.atomic.*’ intrinsic is a specialization of the‘llvm.memcpy.*’ intrinsic. It differs in that thedest andsrc are treatedas arrays with elements that are exactlyelement_size bytes, and the copy betweenbuffers uses a sequence ofunordered atomic load/store operationsthat are a positive integer multiple of theelement_size in size.

Arguments:

The first three arguments are the same as they are in the@llvm.memcpyintrinsic, with the added constraint thatlen is required to be a positive integermultiple of theelement_size. Iflen is not a positive integer multiple ofelement_size, then the behavior of the intrinsic is undefined.

element_size must be a compile-time constant positive power of two no greater thantarget-specific atomic access size limit.

For each of the input pointersalign parameter attribute must be specified. Itmust be a power of two no less than theelement_size. Caller guarantees thatboth the source and destination pointers are aligned to that boundary.

Semantics:

The ‘llvm.memcpy.element.unordered.atomic.*’ intrinsic copieslen bytes ofmemory from the source location to the destination location. These locations are notallowed to overlap. The memory copy is performed as a sequence of load/store operationswhere each access is guaranteed to be a multiple ofelement_size bytes wide andaligned at anelement_size boundary.

The order of the copy is unspecified. The same value may be read from the sourcebuffer many times, but only one write is issued to the destination buffer perelement. It is well defined to have concurrent reads and writes to both source anddestination provided those reads and writes are unordered atomic when specified.

This intrinsic does not provide any additional ordering guarantees over thoseprovided by a set of unordered loads from the source location and stores to thedestination.

Lowering:

In the most general case call to the ‘llvm.memcpy.element.unordered.atomic.*’ islowered to a call to the symbol__llvm_memcpy_element_unordered_atomic_*. Where ‘*’is replaced with an actual element size. SeeRewriteStatepointsForGC intrinsiclowering for details on GC specificlowering.

Optimizer is allowed to inline memory copy when it’s profitable to do so.

llvm.memmove.element.unordered.atomic’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.memmove.element.unordered.atomic on any integer bit width and fordifferent address spaces. Not all targets support all bit widths however.

declarevoid@llvm.memmove.element.unordered.atomic.p0.p0.i32(ptr<dest>,ptr<src>,i32<len>,i32<element_size>)declarevoid@llvm.memmove.element.unordered.atomic.p0.p0.i64(ptr<dest>,ptr<src>,i64<len>,i32<element_size>)
Overview:

The ‘llvm.memmove.element.unordered.atomic.*’ intrinsic is a specializationof the ‘llvm.memmove.*’ intrinsic. It differs in that thedest andsrc are treated as arrays with elements that are exactlyelement_sizebytes, and the copy between buffers uses a sequence ofunordered atomic load/store operations that are a positiveinteger multiple of theelement_size in size.

Arguments:

The first three arguments are the same as they are in the@llvm.memmove intrinsic, with the added constraint thatlen is required to be a positive integer multiple of theelement_size.Iflen is not a positive integer multiple ofelement_size, then thebehavior of the intrinsic is undefined.

element_size must be a compile-time constant positive power of two nogreater than a target-specific atomic access size limit.

For each of the input pointers thealign parameter attribute must bespecified. It must be a power of two no less than theelement_size. Callerguarantees that both the source and destination pointers are aligned to thatboundary.

Semantics:

The ‘llvm.memmove.element.unordered.atomic.*’ intrinsic copieslen bytesof memory from the source location to the destination location. These locationsare allowed to overlap. The memory copy is performed as a sequence of load/storeoperations where each access is guaranteed to be a multiple ofelement_sizebytes wide and aligned at anelement_size boundary.

The order of the copy is unspecified. The same value may be read from the sourcebuffer many times, but only one write is issued to the destination buffer perelement. It is well defined to have concurrent reads and writes to both sourceand destination provided those reads and writes are unordered atomic whenspecified.

This intrinsic does not provide any additional ordering guarantees over thoseprovided by a set of unordered loads from the source location and stores to thedestination.

Lowering:

In the most general case call to the‘llvm.memmove.element.unordered.atomic.*’ is lowered to a call to the symbol__llvm_memmove_element_unordered_atomic_*. Where ‘*’ is replaced with anactual element size. SeeRewriteStatepointsForGC intrinsic lowering for details on GC specificlowering.

The optimizer is allowed to inline the memory copy when it’s profitable to do so.

llvm.memset.element.unordered.atomic’ Intrinsic

Syntax:

This is an overloaded intrinsic. You can usellvm.memset.element.unordered.atomic onany integer bit width and for different address spaces. Not all targetssupport all bit widths however.

declarevoid@llvm.memset.element.unordered.atomic.p0.i32(ptr<dest>,i8<value>,i32<len>,i32<element_size>)declarevoid@llvm.memset.element.unordered.atomic.p0.i64(ptr<dest>,i8<value>,i64<len>,i32<element_size>)
Overview:

The ‘llvm.memset.element.unordered.atomic.*’ intrinsic is a specialization of the‘llvm.memset.*’ intrinsic. It differs in that thedest is treated as an arraywith elements that are exactlyelement_size bytes, and the assignment to that arrayuses uses a sequence ofunordered atomic store operationsthat are a positive integer multiple of theelement_size in size.

Arguments:

The first three arguments are the same as they are in the@llvm.memsetintrinsic, with the added constraint thatlen is required to be a positive integermultiple of theelement_size. Iflen is not a positive integer multiple ofelement_size, then the behavior of the intrinsic is undefined.

element_size must be a compile-time constant positive power of two no greater thantarget-specific atomic access size limit.

Thedest input pointer must have thealign parameter attribute specified. Itmust be a power of two no less than theelement_size. Caller guarantees thatthe destination pointer is aligned to that boundary.

Semantics:

The ‘llvm.memset.element.unordered.atomic.*’ intrinsic sets thelen bytes ofmemory starting at the destination location to the givenvalue. The memory isset with a sequence of store operations where each access is guaranteed to be amultiple ofelement_size bytes wide and aligned at anelement_size boundary.

The order of the assignment is unspecified. Only one write is issued to thedestination buffer per element. It is well defined to have concurrent reads andwrites to the destination provided those reads and writes are unordered atomicwhen specified.

This intrinsic does not provide any additional ordering guarantees over thoseprovided by a set of unordered stores to the destination.

Lowering:

In the most general case call to the ‘llvm.memset.element.unordered.atomic.*’ islowered to a call to the symbol__llvm_memset_element_unordered_atomic_*. Where ‘*’is replaced with an actual element size.

The optimizer is allowed to inline the memory assignment when it’s profitable to do so.

Objective-C ARC Runtime Intrinsics

LLVM provides intrinsics that lower to Objective-C ARC runtime entry points.LLVM is aware of the semantics of these functions, and optimizes based on thatknowledge. You can read more about the details of Objective-C ARChere.

llvm.objc.autorelease’ Intrinsic

Syntax:
declareptr@llvm.objc.autorelease(ptr)
Lowering:

Lowers to a call toobjc_autorelease.

llvm.objc.autoreleasePoolPop’ Intrinsic

Syntax:
declarevoid@llvm.objc.autoreleasePoolPop(ptr)
Lowering:

Lowers to a call toobjc_autoreleasePoolPop.

llvm.objc.autoreleasePoolPush’ Intrinsic

Syntax:
declareptr@llvm.objc.autoreleasePoolPush()
Lowering:

Lowers to a call toobjc_autoreleasePoolPush.

llvm.objc.autoreleaseReturnValue’ Intrinsic

Syntax:
declareptr@llvm.objc.autoreleaseReturnValue(ptr)
Lowering:

Lowers to a call toobjc_autoreleaseReturnValue.

llvm.objc.copyWeak’ Intrinsic

Syntax:
declarevoid@llvm.objc.copyWeak(ptr,ptr)
Lowering:

Lowers to a call toobjc_copyWeak.

llvm.objc.destroyWeak’ Intrinsic

Syntax:
declarevoid@llvm.objc.destroyWeak(ptr)
Lowering:

Lowers to a call toobjc_destroyWeak.

llvm.objc.initWeak’ Intrinsic

Syntax:
declareptr@llvm.objc.initWeak(ptr,ptr)
Lowering:

Lowers to a call toobjc_initWeak.

llvm.objc.loadWeak’ Intrinsic

Syntax:
declareptr@llvm.objc.loadWeak(ptr)
Lowering:

Lowers to a call toobjc_loadWeak.

llvm.objc.loadWeakRetained’ Intrinsic

Syntax:
declareptr@llvm.objc.loadWeakRetained(ptr)
Lowering:

Lowers to a call toobjc_loadWeakRetained.

llvm.objc.moveWeak’ Intrinsic

Syntax:
declarevoid@llvm.objc.moveWeak(ptr,ptr)
Lowering:

Lowers to a call toobjc_moveWeak.

llvm.objc.release’ Intrinsic

Syntax:
declarevoid@llvm.objc.release(ptr)
Lowering:

Lowers to a call toobjc_release.

llvm.objc.retain’ Intrinsic

Syntax:
declareptr@llvm.objc.retain(ptr)
Lowering:

Lowers to a call toobjc_retain.

llvm.objc.retainAutorelease’ Intrinsic

Syntax:
declareptr@llvm.objc.retainAutorelease(ptr)
Lowering:

Lowers to a call toobjc_retainAutorelease.

llvm.objc.retainAutoreleaseReturnValue’ Intrinsic

Syntax:
declareptr@llvm.objc.retainAutoreleaseReturnValue(ptr)
Lowering:

Lowers to a call toobjc_retainAutoreleaseReturnValue.

llvm.objc.retainAutoreleasedReturnValue’ Intrinsic

Syntax:
declareptr@llvm.objc.retainAutoreleasedReturnValue(ptr)
Lowering:

Lowers to a call toobjc_retainAutoreleasedReturnValue.

llvm.objc.retainBlock’ Intrinsic

Syntax:
declareptr@llvm.objc.retainBlock(ptr)
Lowering:

Lowers to a call toobjc_retainBlock.

llvm.objc.storeStrong’ Intrinsic

Syntax:
declarevoid@llvm.objc.storeStrong(ptr,ptr)
Lowering:

Lowers to a call toobjc_storeStrong.

llvm.objc.storeWeak’ Intrinsic

Syntax:
declareptr@llvm.objc.storeWeak(ptr,ptr)
Lowering:

Lowers to a call toobjc_storeWeak.

Preserving Debug Information Intrinsics

These intrinsics are used to carry certain debuginfo together withIR-level operations. For example, it may be desirable toknow the structure/union name and the original user-level fieldindices. Such information got lost in IR GetElementPtr instructionsince the IR types are different from debugInfo types and unionsare converted to structs in IR.

llvm.preserve.array.access.index’ Intrinsic

Syntax:
declare<ret_type>@llvm.preserve.array.access.index.p0s_union.anons.p0a10s_union.anons(<type>base,i32dim,i32index)
Overview:

The ‘llvm.preserve.array.access.index’ intrinsic returns the getelementptr addressbased on array basebase, array dimensiondim and the last access indexindexinto the array. The return typeret_type is a pointer type to the array element.The arraydim andindex are preserved which is more robust thangetelementptr instruction which may be subject to compiler transformation.Thellvm.preserve.access.index type of metadata is attached to this call instructionto provide array or pointer debuginfo type.The metadata is aDICompositeType orDIDerivedType representing thedebuginfo version oftype.

Arguments:

Thebase is the array base address. Thedim is the array dimension.Thebase is a pointer ifdim equals 0.Theindex is the last access index into the array or pointer.

Thebase argument must be annotated with anelementtype attribute at the call-site. This attribute specifies thegetelementptr element type.

Semantics:

The ‘llvm.preserve.array.access.index’ intrinsic produces the same resultas a getelementptr with basebase and access operands{dim's0's,index}.

llvm.preserve.union.access.index’ Intrinsic

Syntax:
declare<type>@llvm.preserve.union.access.index.p0s_union.anons.p0s_union.anons(<type>base,i32di_index)
Overview:

The ‘llvm.preserve.union.access.index’ intrinsic carries the debuginfo field indexdi_index and returns thebase address.Thellvm.preserve.access.index type of metadata is attached to this call instructionto provide union debuginfo type.The metadata is aDICompositeType representing the debuginfo version oftype.The return typetype is the same as thebase type.

Arguments:

Thebase is the union base address. Thedi_index is the field index in debuginfo.

Semantics:

The ‘llvm.preserve.union.access.index’ intrinsic returns thebase address.

llvm.preserve.struct.access.index’ Intrinsic

Syntax:
declare<ret_type>@llvm.preserve.struct.access.index.p0i8.p0s_struct.anon.0s(<type>base,i32gep_index,i32di_index)
Overview:

The ‘llvm.preserve.struct.access.index’ intrinsic returns the getelementptr addressbased on struct basebase and IR struct member indexgep_index.Thellvm.preserve.access.index type of metadata is attached to this call instructionto provide struct debuginfo type.The metadata is aDICompositeType representing the debuginfo version oftype.The return typeret_type is a pointer type to the structure member.

Arguments:

Thebase is the structure base address. Thegep_index is the struct member indexbased on IR structures. Thedi_index is the struct member index based on debuginfo.

Thebase argument must be annotated with anelementtype attribute at the call-site. This attribute specifies thegetelementptr element type.

Semantics:

The ‘llvm.preserve.struct.access.index’ intrinsic produces the same resultas a getelementptr with basebase and access operands{0,gep_index}.

llvm.fptrunc.round’ Intrinsic

Syntax:
declare<ty2>@llvm.fptrunc.round(<type><value>,metadata<roundingmode>)
Overview:

The ‘llvm.fptrunc.round’ intrinsic truncatesfloating-pointvalue to typety2with a specified rounding mode.

Arguments:

The ‘llvm.fptrunc.round’ intrinsic takes afloating-point value to cast and afloating-point typeto cast it to. This argument must be larger in size than the result.

The second argument specifies the rounding mode as described in the constrainedintrinsics section.For this intrinsic, the “round.dynamic” mode is not supported.

Semantics:

The ‘llvm.fptrunc.round’ intrinsic casts avalue from a largerfloating-point type to a smallerfloating-point type.This intrinsic is assumed to execute in the defaultfloating-pointenvironmentexcept for the rounding mode.This intrinsic is not supported on all targets. Some targets may not supportall rounding modes.