Clang Compiler User’s Manual

Introduction

The Clang Compiler is an open-source compiler for the C family ofprogramming languages, aiming to be the best in class implementation ofthese languages. Clang builds on the LLVM optimizer and code generator,allowing it to provide high-quality optimization and code generationsupport for many targets. For more general information, please see theClang Web Site or theLLVM WebSite.

This document describes important notes about using Clang as a compilerfor an end-user, documenting the supported features, command lineoptions, etc. If you are interested in using Clang to build a tool thatprocesses code, please see“Clang” CFE Internals Manual. If you are interested in theClang Static Analyzer, please see its webpage.

Clang is one component in a complete toolchain for C family languages.A separate document describes the other pieces necessary toassemble a complete toolchain.

Clang is designed to support the C family of programming languages,which includesC,Objective-C,C++, andObjective-C++ as well as many dialects of those. Forlanguage-specific information, please see the corresponding languagespecific section:

In addition to these base languages and their dialects, Clang supports abroad variety of language extensions, which are documented in thecorresponding language section. These extensions are provided to becompatible with the GCC, Microsoft, and other popular compilers as wellas to improve functionality through Clang-specific features. The Clangdriver and language features are intentionally designed to be ascompatible with the GNU GCC compiler as reasonably possible, easingmigration from GCC to Clang. In most cases, code “just works”.Clang also provides an alternative driver,clang-cl, that is designedto be compatible with the Visual C++ compiler, cl.exe.

In addition to language specific features, Clang has a variety offeatures that depend on what CPU architecture or operating system isbeing compiled for. Please see theTarget-Specific Features andLimitations section for more details.

The rest of the introduction introduces some basiccompilerterminology that is used throughout this manual andcontains a basicintroduction to using Clang as acommand line compiler.

Terminology

Front end, parser, backend, preprocessor, undefined behavior,diagnostic, optimizer

Basic Usage

Intro to how to use a C compiler for newbies.

compile + link compile then link debug info enabling optimizationspicking a language to use, defaults to C17 by default. Autosenses basedon extension. using a makefile

Command Line Options

This section is generally an index into other sections. It does not gointo depth on the ones that are covered by other sections. However, thefirst part introduces the language selection and other high leveloptions like-c,-g, etc.

Options to Control Error and Warning Messages

-Werror

Turn warnings into errors.

-Werror=foo

Turn warning “foo” into an error.

-Wno-error=foo

Turn warning “foo” into a warning even if-Werror is specified.

-Wfoo

Enable warning “foo”.See thediagnostics reference for a completelist of the warning flags that can be specified in this way.

-Wno-foo

Disable warning “foo”.

-w

Disable all diagnostics.

-Weverything

Enable all diagnostics.

-pedantic

Warn on language extensions.

-pedantic-errors

Error on language extensions.

-Wsystem-headers

Enable warnings from system headers.

-ferror-limit=123

Stop emitting diagnostics after 123 errors have been produced. The default is20, and the error limit can be disabled with-ferror-limit=0.

-ftemplate-backtrace-limit=123

Only emit up to 123 template instantiation notes within the templateinstantiation backtrace for a single warning or error. The default is 10, andthe limit can be disabled with-ftemplate-backtrace-limit=0.

--warning-suppression-mappings=foo.txt

Suppress certain diagnostics for certain files.

Formatting of Diagnostics

Clang aims to produce beautiful diagnostics by default, particularly fornew users that first come to Clang. However, different people havedifferent preferences, and sometimes Clang is driven not by a human,but by a program that wants consistent and easily parsable output. Forthese cases, Clang provides a wide range of options to control the exactoutput format of the diagnostics that it generates.

-f[no-]show-column

Print column number in diagnostic.

This option, which defaults to on, controls whether or not Clangprints the column number of a diagnostic. For example, when this isenabled, Clang will print something like:

test.c:28:8:warning:extratokensatendof#endif directive [-Wextra-tokens]#endif bad^//

When this is disabled, Clang will print “test.c:28: warning…” withno column number.

The printed column numbers count bytes from the beginning of theline; take care if your source contains multibyte characters.

-f[no-]show-source-location

Print source file/line/column information in diagnostic.

This option, which defaults to on, controls whether or not Clangprints the filename, line number and column number of a diagnostic.For example, when this is enabled, Clang will print something like:

test.c:28:8:warning:extratokensatendof#endif directive [-Wextra-tokens]#endif bad^//

When this is disabled, Clang will not print the “test.c:28:8: ”part.

-f[no-]caret-diagnostics

Print source line and ranges from source code in diagnostic.This option, which defaults to on, controls whether or not Clangprints the source line, source ranges, and caret when emitting adiagnostic. For example, when this is enabled, Clang will printsomething like:

test.c:28:8:warning:extratokensatendof#endif directive [-Wextra-tokens]#endif bad^//
-f[no-]color-diagnostics

This option, which defaults to on when a color-capable terminal isdetected, controls whether or not Clang prints diagnostics in color.

When this option is enabled, Clang will use colors to highlightspecific parts of the diagnostic, e.g.,

test.c:28:8:warning: extra tokens at end of #endif directive [-Wextra-tokens]  #endif bad^//

When this is disabled, Clang will just print:

test.c:2:8:warning:extratokensatendof#endif directive [-Wextra-tokens]#endif bad^//

If theNO_COLOR environment variable is defined and not empty(regardless of value), color diagnostics are disabled. IfNO_COLOR isdefined and-fcolor-diagnostics is passed on the command line, Clangwill honor the command line argument.

-fansi-escape-codes

Controls whether ANSI escape codes are used instead of the Windows ConsoleAPI to output colored diagnostics. This option is only used on Windows anddefaults to off.

-fdiagnostics-format=clang/msvc/vi

Changes diagnostic output format to better match IDEs and command line tools.

This option controls the output format of the filename, line number,and column printed in diagnostic messages. The options, and theiraffect on formatting a simple conversion diagnostic, follow:

clang (default)
t.c:3:11:warning:conversionspecifiestype'char *'buttheargumenthastype'int'
msvc
t.c(3,11):warning:conversionspecifiestype'char *'buttheargumenthastype'int'
vi
t.c+3:11:warning:conversionspecifiestype'char *'buttheargumenthastype'int'
-f[no-]diagnostics-show-option

Enable[-Woption] information in diagnostic line.

This option, which defaults to on, controls whether or not Clangprints the associatedwarning groupoption name when outputting a warning diagnostic. For example, inthis output:

test.c:28:8:warning:extratokensatendof#endif directive [-Wextra-tokens]#endif bad^//

Passing-fno-diagnostics-show-option will prevent Clang fromprinting the [-Wextra-tokens] information inthe diagnostic. This information tells you the flag needed to enableor disable the diagnostic, either from the command line or through#pragma GCC diagnostic.

-fdiagnostics-show-category=none/id/name

Enable printing category information in diagnostic line.

This option, which defaults to “none”, controls whether or not Clangprints the category associated with a diagnostic when emitting it.Each diagnostic may or many not have an associated category, if ithas one, it is listed in the diagnostic categorization field of thediagnostic line (in the []’s).

For example, a format string warning will produce these threerenditions based on the setting of this option:

t.c:3:11:warning:conversionspecifiestype'char *'buttheargumenthastype'int'[-Wformat]t.c:3:11:warning:conversionspecifiestype'char *'buttheargumenthastype'int'[-Wformat,1]t.c:3:11:warning:conversionspecifiestype'char *'buttheargumenthastype'int'[-Wformat,FormatString]

This category can be used by clients that want to group diagnosticsby category, so it should be a high level category. We want dozensof these, not hundreds or thousands of them.

-f[no-]save-optimization-record[=<format>]

Enable optimization remarks during compilation and write them to a separatefile.

This option, which defaults to off, controls whether Clang writesoptimization reports to a separate file. By recording diagnostics in a file,users can parse or sort the remarks in a convenient way.

By default, the serialization format is YAML.

The supported serialization formats are:

  • -fsave-optimization-record=yaml: A structured YAML format.

  • -fsave-optimization-record=bitstream: A binary format based on LLVMBitstream.

The output file is controlled by-foptimization-record-file.

In the absence of an explicit output file, the file is chosen using thefollowing scheme:

<base>.opt.<format>

where<base> is based on the output file of the compilation (whetherit’s explicitly specified through-o or not) when used with-c or-S.For example:

  • clang-fsave-optimization-record-cin.c-oout.o will generateout.opt.yaml

  • clang-fsave-optimization-record-cin.c will generatein.opt.yaml

When targeting (Thin)LTO, the base is derived from the output filename, andthe extension is not dropped.

When targeting ThinLTO, the following scheme is used:

<base>.opt.<format>.thin.<num>.<format>

Darwin-only: when used for generating a linked binary from a source file(through an intermediate object file), the driver will invokecc1 togenerate a temporary object file. The temporary remark file will be emittednext to the object file, which will then be picked up bydsymutil andemitted in the .dSYM bundle. This is available for all formats except YAML.

For example:

clang-fsave-optimization-record=bitstreamin.c-oout will generate

  • /var/folders/43/9y164hh52tv_2nrdxrj31nyw0000gn/T/a-9be59b.o

  • /var/folders/43/9y164hh52tv_2nrdxrj31nyw0000gn/T/a-9be59b.opt.bitstream

  • out

  • out.dSYM/Contents/Resources/Remarks/out

Darwin-only: compiling for multiple architectures will use the followingscheme:

<base>-<arch>.opt.<format>

Note that this is incompatible with passing the-foptimization-record-file option.

-foptimization-record-file

Control the file to which optimization reports are written. This implies-fsave-optimization-record.

On Darwin platforms, this is incompatible with passing multiple-arch<arch> options.

-foptimization-record-passes

Only include passes which match a specified regular expression.

When optimization reports are being output (see-fsave-optimization-record), thisoption controls the passes that will be included in the final report.

If this option is not used, all the passes are included in the optimizationrecord.

-f[no-]diagnostics-show-hotness

Enable profile hotness information in diagnostic line.

This option controls whether Clang prints the profile hotness associatedwith diagnostics in the presence of profile-guided optimization information.This is currently supported with optimization remarks (seeOptions to Emit Optimization Reports). The hotness informationallows users to focus on the hot optimization remarks that are likely to bemore relevant for run-time performance.

For example, in this output, the block containing the callsite offoo wasexecuted 3000 times according to the profile data:

s.c:7:10:remark:fooinlinedintobar(hotness:3000)[-Rpass-analysis=inline]sum+=foo(x,x-2);^

This option is implied when-fsave-optimization-record is used.Otherwise, it defaults to off.

-fdiagnostics-hotness-threshold

Prevent optimization remarks from being output if they do not have at leastthis hotness value.

This option, which defaults to zero, controls the minimum hotness anoptimization remark would need in order to be output by Clang. This iscurrently supported with optimization remarks (seeOptions to EmitOptimization Reports) when profile hotness information indiagnostics is enabled (see-fdiagnostics-show-hotness).

-f[no-]diagnostics-fixit-info

Enable “FixIt” information in the diagnostics output.

This option, which defaults to on, controls whether or not Clangprints the information on how to fix a specific diagnosticunderneath it when it knows. For example, in this output:

test.c:28:8:warning:extratokensatendof#endif directive [-Wextra-tokens]#endif bad^//

Passing-fno-diagnostics-fixit-info will prevent Clang fromprinting the “//” line at the end of the message. This informationis useful for users who may not understand what is wrong, but can beconfusing for machine parsing.

-fdiagnostics-print-source-range-info

Print machine parsable information about source ranges.This option makes Clang print information about source ranges in a machineparsable format after the file/line/column number information. Theinformation is a simple sequence of brace enclosed ranges, where each rangelists the start and end line/column locations. For example, in this output:

exprs.c:47:15:{47:8-47:14}{47:17-47:24}:error:invalidoperandstobinaryexpression('int *'and'_Complex float')P=(P-42)+Gamma*4;~~~~~~^~~~~~~~

The {}’s are generated by -fdiagnostics-print-source-range-info.

The printed column numbers count bytes from the beginning of theline; take care if your source contains multibyte characters.

-fdiagnostics-parseable-fixits

Print Fix-Its in a machine parseable form.

This option makes Clang print available Fix-Its in a machineparseable format at the end of diagnostics. The following exampleillustrates the format:

fix-it:"t.cpp":{7:25-7:29}:"Gamma"

The range printed is a half-open range, so in this example thecharacters at column 25 up to but not including column 29 on line 7in t.cpp should be replaced with the string “Gamma”. Either therange or the replacement string may be empty (representing strictinsertions and strict erasures, respectively). Both the file nameand the insertion string escape backslash (as “\\”), tabs (as“\t”), newlines (as “\n”), double quotes(as “\””) andnon-printable characters (as octal “\xxx”).

The printed column numbers count bytes from the beginning of theline; take care if your source contains multibyte characters.

-fno-elide-type

Turns off elision in template type printing.

The default for template type printing is to elide as many templatearguments as possible, removing those which are the same in bothtemplate types, leaving only the differences. Adding this flag willprint all the template arguments. If supported by the terminal,highlighting will still appear on differing arguments.

Default:

t.cc:4:5:note:candidatefunctionnotviable:noknownconversionfrom'vector<map<[...], map<float, [...]>>>'to'vector<map<[...], map<double, [...]>>>'for1stargument;

-fno-elide-type:

t.cc:4:5:note:candidatefunctionnotviable:noknownconversionfrom'vector<map<int, map<float, int>>>'to'vector<map<int, map<double, int>>>'for1stargument;
-fdiagnostics-show-template-tree

Template type diffing prints a text tree.

For diffing large templated types, this option will cause Clang todisplay the templates as an indented text tree, one argument perline, with differences marked inline. This is compatible with-fno-elide-type.

Default:

t.cc:4:5:note:candidatefunctionnotviable:noknownconversionfrom'vector<map<[...], map<float, [...]>>>'to'vector<map<[...], map<double, [...]>>>'for1stargument;

With-fdiagnostics-show-template-tree:

t.cc:4:5:note:candidatefunctionnotviable:noknownconversionfor1stargument;vector<map<[...],map<[float!=double],[...]>>>
-fcaret-diagnostics-max-lines:

Controls how many lines of code clang prints for diagnostics. By default,clang prints a maximum of 16 lines of code.

-fdiagnostics-show-line-numbers:

Controls whether clang will print a margin containing the line number onthe left of each line of code it prints for diagnostics.

Default:

test.cpp:5:1:error:'main'mustreturn'int'5|voidmain(){}|^~~~|int

With -fno-diagnostics-show-line-numbers:

test.cpp:5:1:error:'main'mustreturn'int'voidmain(){}^~~~int

Individual Warning Groups

TODO: Generate this from tblgen. Define one anchor per warning group.

-Wextra-tokens

Warn about excess tokens at the end of a preprocessor directive.

This option, which defaults to on, enables warnings about extratokens at the end of preprocessor directives. For example:

test.c:28:8:warning:extratokensatendof#endif directive [-Wextra-tokens]#endif bad^

These extra tokens are not strictly conforming, and are usually besthandled by commenting them out.

-Wambiguous-member-template

Warn about unqualified uses of a member template whose name resolves toanother template at the location of the use.

This option, which defaults to on, enables a warning in thefollowing code:

template<typenameT>structset{};template<typenameT>structtrait{typedefconstT&type;};structValue{template<typenameT>voidset(typenametrait<T>::typevalue){}};voidfoo(){Valuev;v.set<double>(3.2);}

C++ [basic.lookup.classref] requires this to be an error, but,because it’s hard to work around, Clang downgrades it to a warningas an extension.

-Wbind-to-temporary-copy

Warn about an unusable copy constructor when binding a reference to atemporary.

This option enables warnings about binding areference to a temporary when the temporary doesn’t have a usablecopy constructor. For example:

structNonCopyable{NonCopyable();private:NonCopyable(constNonCopyable&);};voidfoo(constNonCopyable&);voidbar(){foo(NonCopyable());//DisallowedinC++98;allowedinC++11.}
structNonCopyable2{NonCopyable2();NonCopyable2(NonCopyable2&);};voidfoo(constNonCopyable2&);voidbar(){foo(NonCopyable2());//DisallowedinC++98;allowedinC++11.}

Note that ifNonCopyable2::NonCopyable2() has a default argumentwhose instantiation produces a compile error, that error will stillbe a hard error in C++98 mode even if this warning is turned off.

Options to Control Clang Crash Diagnostics

As unbelievable as it may sound, Clang does crash from time to time.Generally, this only occurs to those living on thebleedingedge. Clang goes to greatlengths to assist you in filing a bug report. Specifically, Clanggenerates preprocessed source file(s) and associated run script(s) upona crash. These files should be attached to a bug report to easereproducibility of the failure. Below are the command line options tocontrol the crash diagnostics.

-fcrash-diagnostics=<val>

Valid values are:

  • off (Disable auto-generation of preprocessed source files during a clang crash.)

  • compiler (Generate diagnostics for compiler crashes (default))

  • all (Generate diagnostics for all tools which support it)

-fno-crash-diagnostics

Disable auto-generation of preprocessed source files during a clang crash.

The -fno-crash-diagnostics flag can be helpful for speeding the processof generating a delta reduced test case.

-fcrash-diagnostics-dir=<dir>

Specify where to write the crash diagnostics files; defaults to theusual location for temporary files.

CLANG_CRASH_DIAGNOSTICS_DIR=<dir>

Like-fcrash-diagnostics-dir=<dir>, specifies where to write thecrash diagnostics files, but with lower precedence than the option.

Clang is also capable of generating preprocessed source file(s) and associatedrun script(s) even without a crash. This is specially useful when trying togenerate a reproducer for warnings or errors while using modules.

-gen-reproducer

Generates preprocessed source files, a reproducer script and if relevant, acache containing: built module pcm’s and all headers needed to rebuild thesame modules.

Options to Emit Optimization Reports

Optimization reports trace, at a high-level, all the major decisionsdone by compiler transformations. For instance, when the inlinerdecides to inline functionfoo() intobar(), or the loop unrollerdecides to unroll a loop N times, or the vectorizer decides tovectorize a loop body.

Clang offers a family of flags which the optimizers can use to emita diagnostic in three cases:

  1. When the pass makes a transformation (-Rpass).

  2. When the pass fails to make a transformation (-Rpass-missed).

  3. When the pass determines whether or not to make a transformation(-Rpass-analysis).

NOTE: Although the discussion below focuses on-Rpass, the exactsame options apply to-Rpass-missed and-Rpass-analysis.

Since there are dozens of passes inside the compiler, each of these flagstake a regular expression that identifies the name of the pass which shouldemit the associated diagnostic. For example, to get a report from the inliner,compile the code with:

$clang-O2-Rpass=inlinecode.cc-ocodecode.cc:4:25: remark: foo inlined into bar [-Rpass=inline]int bar(int j) { return foo(j, j - 2); }                        ^

Note that remarks from the inliner are identified with[-Rpass=inline].To request a report from every optimization pass, you should use-Rpass=.* (in fact, you can use any valid POSIX regularexpression). However, do not expect a report from every transformationmade by the compiler. Optimization remarks do not really make senseoutside of the major transformations (e.g., inlining, vectorization,loop optimizations) and not every optimization pass supports thisfeature.

Note that when using profile-guided optimization information, profile hotnessinformation can be included in the remarks (see-fdiagnostics-show-hotness).

Current limitations

  1. Optimization remarks that refer to function names will display themangled name of the function. Since these remarks are emitted by theback end of the compiler, it does not know anything about the inputlanguage, nor its mangling rules.

  2. Some source locations are not displayed correctly. The front end hasa more detailed source location tracking than the locations includedin the debug info (e.g., the front end can locate code inside macroexpansions). However, the locations used by-Rpass aretranslated from debug annotations. That translation can be lossy,which results in some remarks having no location information.

Options to Emit Resource Consumption Reports

These are options that report execution time and consumed memory of differentcompilations steps.

-fproc-stat-report=

This option requests driver to print used memory and execution time of eachcompilation step. Theclang driver during execution calls different tools,like compiler, assembler, linker etc. With this option the driver reportstotal execution time, the execution time spent in user mode and peak memoryusage of each the called tool. Value of the option specifies where the reportis sent to. If it specifies a regular file, the data are saved to this file inCSV format:

$clang-fproc-stat-report=abcfoo.c$catabcclang-11,"/tmp/foo-123456.o",92000,84000,87536ld,"a.out",900,8000,53568

The data on each row represent:

  • file name of the tool executable,

  • output file name in quotes,

  • total execution time in microseconds,

  • execution time in user mode in microseconds,

  • peak memory usage in Kb.

It is possible to specify this option without any value. In this case statisticsare printed on standard output in human readable format:

$clang-fproc-stat-reportfoo.cclang-11: output=/tmp/foo-855a8e.o, total=68.000 ms, user=60.000 ms, mem=86920 Kbld: output=a.out, total=8.000 ms, user=4.000 ms, mem=52320 Kb

The report file specified in the option is locked for write, so this optioncan be used to collect statistics in parallel builds. The report file is notcleared, new data is appended to it, thus making possible to accumulate buildstatistics.

You can also use environment variables to control the process statistics reporting.SettingCC_PRINT_PROC_STAT to1 enables the feature, the report goes tostdout in human readable format.SettingCC_PRINT_PROC_STAT_FILE to a fully qualified file path makes it reportprocess statistics to the given file in the CSV format. Specifying a relativepath will likely lead to multiple files with the same name created in differentdirectories, since the path is relative to a changing working directory.

These environment variables are handy when you need to request the statisticsreport without changing your build scripts or alter the existing set of compileroptions. Note that-fproc-stat-report take precedence overCC_PRINT_PROC_STATandCC_PRINT_PROC_STAT_FILE.

$exportCC_PRINT_PROC_STAT=1$exportCC_PRINT_PROC_STAT_FILE=~/project-build-proc-stat.csv$make

Other Options

Clang options that don’t fit neatly into other categories.

-fgnuc-version=

This flag controls the value of__GNUC__ and related macros. This flagdoes not enable or disable any GCC extensions implemented in Clang. Settingthe version to zero causes Clang to leave__GNUC__ and otherGNU-namespaced macros, such as__GXX_WEAK__, undefined.

-MV

When emitting a dependency file, use formatting conventions appropriatefor NMake or Jom. Ignored unless another option causes Clang to emit adependency file.

When Clang emits a dependency file (e.g., you supplied the -M option)most filenames can be written to the file without any special formatting.Different Make tools will treat different sets of characters as “special”and use different conventions for telling the Make tool that the characteris actually part of the filename. Normally Clang uses backslash to “escape”a special character, which is the convention used by GNU Make. The -MVoption tells Clang to put double-quotes around the entire filename, whichis the convention used by NMake and Jom.

-femit-dwarf-unwind=<value>

When to emit DWARF unwind (EH frame) info. This is a Mach-O-specific option.

Valid values are:

  • no-compact-unwind - Only emit DWARF unwind when compact unwind encodingsaren’t available. This is the default for arm64.

  • always - Always emit DWARF unwind regardless.

  • default - Use the platform-specific default (always for allnon-arm64-platforms).

no-compact-unwind is a performance optimization – Clang will emit smallerobject files that are more quickly processed by the linker. This may causebinary compatibility issues on older x86_64 targets, however, so use it withcaution.

-fdisable-block-signature-string

Instruct clang not to emit the signature string for blocks. Disabling thestring can potentially break existing code that relies on it. Users shouldcarefully consider this possibility when using the flag.

Configuration files

Configuration files group command-line options and allow all of them to bespecified just by referencing the configuration file. They may be used, forexample, to collect options required to tune compilation for particulartarget, such as-L,-I,-l,--sysroot, codegen options, etc.

Configuration files can be either specified on the command line or loadedfrom default locations. If both variants are present, the default configurationfiles are loaded first.

The command line option--config= can be used to specify explicitconfiguration files in a Clang invocation. If the option is used multiple times,all specified files are loaded, in order. For example:

clang--config=/home/user/cfgs/testing.txtclang--config=debug.cfg--config=runtimes.cfg

If the provided argument contains a directory separator, it is considered asa file path, and options are read from that file. Otherwise the argument istreated as a file name and is searched for sequentially in the directories:

  • user directory,

  • system directory,

  • the directory where Clang executable resides.

Both user and system directories for configuration files can be specifiedeither during build or during runtime. At build time, useCLANG_CONFIG_FILE_USER_DIR andCLANG_CONFIG_FILE_SYSTEM_DIR. At runtime use the--config-user-dir= and--config-system-dir= command lineoptions. Specifying config directories at runtime overrides the configdirectories set at build time The first file found is used. It is an error ifthe required file cannot be found.

The default configuration files are searched for in the same directoriesfollowing the rules described in the next paragraphs. Loading defaultconfiguration files can be disabled entirely via passingthe--no-default-config flag.

First, the algorithm searches for a configuration file named<triple>-<driver>.cfg wheretriple is the triple for the target beingbuilt for, anddriver is the name of the currently used driver. The algorithmfirst attempts to use the canonical name for the driver used, then falls backto the one found in the executable name.

The following canonical driver names are used:

  • clang for thegcc driver (used to compile C programs)

  • clang++ for thegxx driver (used to compile C++ programs)

  • clang-cpp for thecpp driver (pure preprocessor)

  • clang-cl for thecl driver

  • flang for theflang driver

  • clang-dxc for thedxc driver

For example, when callingx86_64-pc-linux-gnu-clang-g++,the driver will first attempt to use the configuration file named:

x86_64-pc-linux-gnu-clang++.cfg

If this file is not found, it will attempt to use the name foundin the executable instead:

x86_64-pc-linux-gnu-clang-g++.cfg

Note that options such as--driver-mode=,--target=,-m32 affectthe search algorithm. For example, the aforementioned executable called with-m32 argument will instead search for:

i386-pc-linux-gnu-clang++.cfg

If none of the aforementioned files are found, the driver will instead searchfor separate driver and target configuration files and attempt to load both.The former is named<driver>.cfg while the latter is named<triple>.cfg. Similarly to the previous variants, the canonical driver namewill be preferred, and the compiler will fall back to the actual name.

For example,x86_64-pc-linux-gnu-clang-g++ will attempt to load twoconfiguration files named respectively:

clang++.cfgx86_64-pc-linux-gnu.cfg

with fallback to trying:

clang-g++.cfgx86_64-pc-linux-gnu.cfg

It is not an error if either of these files is not found.

The configuration file consists of command-line options specified on one ormore lines. Lines composed of whitespace characters only are ignored as well aslines in which the first non-blank character is#. Long options may be splitbetween several lines by a trailing backslash. Here is example of aconfiguration file:

# Several options on line-c--target=x86_64-unknown-linux-gnu# Long option split between lines-I/usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../\include/c++/5.4.0# other config files may be included@linux.options

Files included by@file directives in configuration files are resolvedrelative to the including file. For example, if a configuration file~/.llvm/target.cfg contains the directive@os/linux.opts, the filelinux.opts is searched for in the directory~/.llvm/os. Another way toinclude a file content is using the command line option--config=. It workssimilarly but the included file is searched for using the rules for configurationfiles.

To generate paths relative to the configuration file, the<CFGDIR> token maybe used. This will expand to the absolute path of the directory containing theconfiguration file.

In cases where a configuration file is deployed alongside SDK contents, theSDK directory can remain fully portable by using<CFGDIR> prefixed paths.In this way, the user may only need to specify a root configuration file with--config= to establish every aspect of the SDK with the compiler:

--target=foo-isystem<CFGDIR>/include-L<CFGDIR>/lib-T<CFGDIR>/ldscripts/link.ld

Usually, config file options are placed before command-line options, regardlessof the actual operation to be performed. The exception is being made for theoptions prefixed with the$ character. These will be used only when linkeris being invoked, and added after all of the command-line specified linkerinputs. Here is some example of$-prefixed options:

$-Wl,-Bstatic $-lm$-Wl,-Bshared

Language and Target-Independent Features

Freestanding Builds

Passing the-ffreestanding flag causes Clang to build for a freestanding(rather than a hosted) environment. The flag has the following effects:

  • the__STDC_HOSTED__ predefined macro will expand to0,

  • builtin functions are disabled by default (-fno-builtins),

  • unwind tables are disabled by default(fno-asynchronous-unwind-tables-fno-unwind-tables), and

  • does not treat the globalmain function as a special function.

An implementation of the following runtime library functions must always beprovided with the usual semantics, as Clang will generate calls to them:

  • memcpy,

  • memmove, and

  • memset.

Clang does not, by itself, provide a full “conforming freestandingimplementation”. If you wish to have a conforming freestanding implementation,you must provide a freestanding C library. While Clang provides some of therequired header files, it does not provide all of them, nor any libraryimplementations.

Conversely, when-ffreestanding is specified, Clang does not require you toprovide a conforming freestanding implementation library. Clang will not makeany assumptions as to the availability or semantics of standard-libraryfunctions other than those mentioned above.

Controlling Errors and Warnings

Clang provides a number of ways to control which code constructs causeit to emit errors and warning messages, and how they are displayed tothe console.

Controlling How Clang Displays Diagnostics

When Clang emits a diagnostic, it includes rich information in theoutput, and gives you fine-grain control over which information isprinted. Clang has the ability to print this information, and these arethe options that control it:

  1. A file/line/column indicator that shows exactly where the diagnosticoccurs in your code [-fshow-column,-fshow-source-location].

  2. A categorization of the diagnostic as a note, warning, error, orfatal error.

  3. A text string that describes what the problem is.

  4. An option that indicates how to control the diagnostic (fordiagnostics that support it)[-fdiagnostics-show-option].

  5. Ahigh-level category for the diagnosticfor clients that want to group diagnostics by class (for diagnosticsthat support it)[-fdiagnostics-show-category].

  6. The line of source code that the issue occurs on, along with a caretand ranges that indicate the important locations[-fcaret-diagnostics].

  7. “FixIt” information, which is a concise explanation of how to fix theproblem (when Clang is certain it knows)[-fdiagnostics-fixit-info].

  8. A machine-parsable representation of the ranges involved (off bydefault)[-fdiagnostics-print-source-range-info].

For more information please seeFormatting ofDiagnostics.

Diagnostic Mappings

All diagnostics are mapped into one of these 6 classes:

  • Ignored

  • Note

  • Remark

  • Warning

  • Error

  • Fatal

Diagnostic Categories

Though not shown by default, diagnostics may each be associated with ahigh-level category. This category is intended to make it possible totriage builds that produce a large number of errors or warnings in agrouped way.

Categories are not shown by default, but they can be turned on with the-fdiagnostics-show-category option.When set to “name”, the category is printed textually in thediagnostic output. When it is set to “id”, a category number isprinted. The mapping of category names to category id’s can be obtainedby running ‘clang  --print-diagnostic-categories’.

Controlling Diagnostics via Command Line Flags

TODO: -W flags, -pedantic, etc

Controlling Diagnostics via Pragmas

Clang can also control what diagnostics are enabled through the use ofpragmas in the source code. This is useful for turning off specificwarnings in a section of source code. Clang supports GCC’s pragma forcompatibility with existing source code, so#pragmaGCCdiagnosticand#pragmaclangdiagnostic are synonyms for Clang. GCC will ignore#pragmaclangdiagnostic, though.

The pragma may control any warning that can be used from the commandline. Warnings may be set to ignored, warning, error, or fatal. Thefollowing example code will tell Clang or GCC to ignore the-Wallwarnings:

#pragma GCC diagnostic ignored "-Wall"

Clang also allows you to push and pop the current warning state. This isparticularly useful when writing a header file that will be compiled byother people, because you don’t know what warning flags they build with.

In the below example-Wextra-tokens is ignored for only a single lineof code, after which the diagnostics return to whatever state had previouslyexisted.

#if foo#endif foo// warning: extra tokens at end of #endif directive#pragma GCC diagnostic push#pragma GCC diagnostic ignored "-Wextra-tokens"#if foo#endif foo// no warning#pragma GCC diagnostic pop

The push and pop pragmas will save and restore the full diagnostic stateof the compiler, regardless of how it was set. It should be noted that while Clangsupports the GCC pragma, Clang and GCC do not support the exact same setof warnings, so even when using GCC compatible #pragmas there is noguarantee that they will have identical behaviour on both compilers.

Clang also doesn’t yet support GCC behavior for#pragmadiagnosticpopthat doesn’t have a corresponding#pragmadiagnosticpush. In this caseGCC pretends that there is a#pragmadiagnosticpush at the very beginningof the source file, so “unpaired”#pragmadiagnosticpop matches thatimplicit push. This makes a difference for#pragmaGCCdiagnosticignoredwhich are not guarded by push and pop. Refer toGCC documentationfor details.

Like GCC, Clang acceptsignored,warning,error, andfatalseverity levels. They can be used to change severity of a particular diagnosticfor a region of source file. A notable difference from GCC is that diagnosticnot enabled via command line arguments can’t be enabled this way yet.

Some diagnostics associated with a-W flag have the error severity bydefault. They can be ignored or downgraded to warnings:

// C only#pragma GCC diagnostic warning "-Wimplicit-function-declaration"intmain(void){puts("");}

In addition to controlling warnings and errors generated by the compiler, it ispossible to generate custom warning and error messages through the followingpragmas:

// The following will produce warning messages#pragma message "some diagnostic message"#pragma GCC warning "TODO: replace deprecated feature"// The following will produce an error message#pragma GCC error "Not supported"

These pragmas operate similarly to the#warning and#error preprocessordirectives, except that they may also be embedded into preprocessor macros viathe C99_Pragma operator, for example:

#define STR(X) #X#define DEFER(M,...) M(__VA_ARGS__)#define CUSTOM_ERROR(X) _Pragma(STR(GCC error(X " at line " DEFER(STR,__LINE__))))CUSTOM_ERROR("Feature not available");

Controlling Diagnostics in System Headers

Warnings are suppressed when they occur in system headers. By default,an included file is treated as a system header if it is found in aninclude path specified by-isystem, but this can be overridden inseveral ways.

Thesystem_header pragma can be used to mark the current file asbeing a system header. No warnings will be produced from the location ofthe pragma onwards within the same file.

#if foo#endif foo// warning: extra tokens at end of #endif directive#pragma clang system_header#if foo#endif foo// no warning

The–system-header-prefix= and–no-system-header-prefix=command-line arguments can be used to override whether subsets of an includepath are treated as system headers. When the name in a#include directiveis found within a header search path and starts with a system prefix, theheader is treated as a system header. The last prefix on thecommand-line which matches the specified header name takes precedence.For instance:

$clang-Ifoo-isystembar--system-header-prefix=x/\--no-system-header-prefix=x/y/

Here,#include"x/a.h" is treated as including a system header, evenif the header is found infoo, and#include"x/y/b.h" is treatedas not including a system header, even if the header is found inbar.

A#include directive which finds a file relative to the currentdirectory is treated as including a system header if the including fileis treated as a system header.

Controlling Deprecation Diagnostics in Clang-Provided C Runtime Headers

Clang is responsible for providing some of the C runtime headers that cannot beprovided by a platform CRT, such as implementation limits or when compiling infreestanding mode. Define the_CLANG_DISABLE_CRT_DEPRECATION_WARNINGS macroprior to including such a C runtime header to disable the deprecation warnings.Note that the C Standard Library headers are allowed to transitively includeother standard library headers (see 7.1.2p5), and so the most appropriate useof this macro is to set it within the build system using-D or before anyinclude directives in the translation unit.

#define _CLANG_DISABLE_CRT_DEPRECATION_WARNINGS#include<stdint.h>    // Clang CRT deprecation warnings are disabled.#include<stdatomic.h> // Clang CRT deprecation warnings are disabled.

Enabling All Diagnostics

In addition to the traditional-W flags, one can enableall diagnosticsby passing-Weverything. This works as expected with-Werror, and also includes the warnings from-pedantic. Somediagnostics contradict each other, therefore, users of-Weverythingoften disable many diagnostics such as-Wno-c++98-compat and-Wno-c++-compatbecause they contradict recent C++ standards.

Since-Weverything enables every diagnostic, we generally don’trecommend using it.-Wall-Wextra are a better choice for most projects.Using-Weverything means that updating your compiler is more difficultbecause you’re exposed to experimental diagnostics which might be of lowerquality than the default ones. If you do use-Weverything then weadvise that you address all new compiler diagnostics as they get added to Clang,either by fixing everything they find or explicitly disabling that diagnosticwith its correspondingWno- option.

Note that when combined with-w (which disables all warnings),disabling all warnings wins.

Controlling Diagnostics via Suppression Mappings

Warning suppression mappings enable users to suppress Clang’s diagnostics at aper-file granularity. This allows enforcing diagnostics in specific parts of theproject even if there are violations in some headers.

$catmappings.txt[unused]src:foo/*$clang--warning-suppression-mappings=mapping.txt-Wunusedfoo/bar.cc#Thiscompilationwon't emit any unused findings for sources under foo/#directory. But it'llstillcomplainforalltheothersources,e.g:$catfoo/bar.cc#include"dir/include.h"//Clangflagsunuseddeclarationshere.#include"foo/include.h"//butunusedwarningsunderthissourceisomitted.#include"next_to_bar_cc.h"//asareunusedwarningsfromthisheaderfile.// Further, unused warnings in the remainder of bar.cc are also omitted.

SeeWarning suppression mappings for details about the file format andfunctionality.

Controlling Static Analyzer Diagnostics

While not strictly part of the compiler, the diagnostics from Clang’sstatic analyzer can also beinfluenced by the user via changes to the source code. See the availableannotations and the analyzer’sFAQ page for more information.

Precompiled Headers

Precompiled headersare a general approach employed by many compilers to reduce compilationtime. The underlying motivation of the approach is that it is common forthe same (and often large) header files to be included by multiplesource files. Consequently, compile times can often be greatly improvedby caching some of the (redundant) work done by a compiler to processheaders. Precompiled header files, which represent one of many ways toimplement this optimization, are literally files that represent anon-disk cache that contains the vital information necessary to reducesome of the work needed to process a corresponding header file. Whiledetails of precompiled headers vary between compilers, precompiledheaders have been shown to be highly effective at speeding up programcompilation on systems with very large system headers (e.g., macOS).

Generating a PCH File

To generate a PCH file using Clang, one invokes Clang with the-x <language>-header option. This mirrors the interface in GCCfor generating PCH files:

$gcc-xc-headertest.h-otest.h.gch$clang-xc-headertest.h-otest.h.pch

Using a PCH File

A PCH file can then be used as a prefix header when a-include-pchoption is passed toclang:

$clang-include-pchtest.h.pchtest.c-otest

Theclang driver will check if the PCH filetest.h.pch isavailable; if so, the contents oftest.h (and the files it includes)will be processed from the PCH file. Otherwise, Clang will report an error.

Note

Clang doesnot automatically use PCH files for headers that are directlyincluded within a source file or indirectly via-include.For example:

$clang-xc-headertest.h-otest.h.pch$cattest.c#include"test.h"$clangtest.c-otest

In this example,clang will not automatically use the PCH file fortest.h sincetest.h was included directly in the source file and notspecified on the command line using-include-pch.

Ignoring a PCH File

To ignore PCH options, a-ignore-pch option is passed toclang:

$clang-xc-headertest.h-Xclang-ignore-pch-otest.h.pch$clang-include-pchtest.h.pch-Xclang-ignore-pchtest.c-otest

This option disables precompiled headers, overrides -emit-pch and -include-pch.test.h.pch is not generated and not used as a prefix header.

Relocatable PCH Files

It is sometimes necessary to build a precompiled header from headersthat are not yet in their final, installed locations. For example, onemight build a precompiled header within the build tree that is thenmeant to be installed alongside the headers. Clang permits the creationof “relocatable” precompiled headers, which are built with a given path(into the build directory) and can later be used from an installedlocation.

To build a relocatable precompiled header, place your headers into asubdirectory whose structure mimics the installed location. For example,if you want to build a precompiled header for the headermylib.hthat will be installed into/usr/include, create a subdirectorybuild/usr/include and place the headermylib.h into thatsubdirectory. Ifmylib.h depends on other headers, then they can bestored withinbuild/usr/include in a way that mimics the installedlocation.

Building a relocatable precompiled header requires two additionalarguments. First, pass the--relocatable-pch flag to indicate thatthe resulting PCH file should be relocatable. Second, pass-isysroot/path/to/build, which makes all includes for your libraryrelative to the build directory. For example:

#clang-xc-header--relocatable-pch-isysroot/path/to/build/path/to/build/mylib.hmylib.h.pch

When loading the relocatable PCH file, the various headers used in thePCH file are found from the system header root. For example,mylib.hcan be found in/usr/include/mylib.h. If the headers are installedin some other system root, the-isysroot option can be used providea different system root from which the headers will be based. Forexample,-isysroot/Developer/SDKs/MacOSX10.4u.sdk will look formylib.h in/Developer/SDKs/MacOSX10.4u.sdk/usr/include/mylib.h.

Relocatable precompiled headers are intended to be used in a limitednumber of cases where the compilation environment is tightly controlledand the precompiled header cannot be generated after headers have beeninstalled.

Controlling Floating Point Behavior

Clang provides a number of ways to control floating point behavior, includingwith command line options and source pragmas. This sectiondescribes the various floating point semantic modes and the corresponding options.

Floating Point Semantic Modes

Mode

Values

ffp-exception-behavior

{ignore, strict, maytrap}

fenv_access

{off, on}

(none)

frounding-math

{dynamic, tonearest, downward, upward, towardzero}

ffp-contract

{on, off, fast, fast-honor-pragmas}

fdenormal-fp-math

{IEEE, PreserveSign, PositiveZero}

fdenormal-fp-math-fp32

{IEEE, PreserveSign, PositiveZero}

fmath-errno

{on, off}

fhonor-nans

{on, off}

fhonor-infinities

{on, off}

fsigned-zeros

{on, off}

freciprocal-math

{on, off}

fallow-approximate-fns

{on, off}

fassociative-math

{on, off}

fcomplex-arithmetic

{basic, improved, full, promoted}

This table describes the option settings that correspond to the threefloating point semantic models: precise (the default), strict, and fast.

Floating Point Models

Mode

Precise

Strict

Fast

Aggressive

except_behavior

ignore

strict

ignore

ignore

fenv_access

off

on

off

off

rounding_mode

tonearest

dynamic

tonearest

tonearest

contract

on

off

fast

fast

support_math_errno

on

on

off

off

no_honor_nans

off

off

off

on

no_honor_infinities

off

off

off

on

no_signed_zeros

off

off

on

on

allow_reciprocal

off

off

on

on

allow_approximate_fns

off

off

on

on

allow_reassociation

off

off

on

on

complex_arithmetic

full

full

promoted

basic

The-ffp-model option does not modify thefdenormal-fp-mathsetting, but it does have an impact on whethercrtfastmath.o islinked. Because linkingcrtfastmath.o has a global effect on theprogram, and because the global denormal handling can be changed inother ways, the state offdenormal-fp-math handling cannotbe assumed in any function based on fp-model. SeeA note about crtfastmath.ofor more details.

-ffast-math

Enable fast-math mode. This option lets thecompiler make aggressive, potentially-lossy assumptions aboutfloating-point math. These include:

  • Floating-point math obeys regular algebraic rules for real numbers (e.g.+ and* are associative,x/y==x*(1/y), and(a+b)*c==a*c+b*c),

  • NoNaN or infinite values will be operands or results offloating-point operations,

  • +0 and-0 may be treated as interchangeable.

-ffast-math also defines the__FAST_MATH__ preprocessormacro. Some math libraries recognize this macro and change their behavior.With the exception of-ffp-contract=fast, using any of the optionsbelow to disable any of the individual optimizations in-ffast-mathwill cause__FAST_MATH__ to no longer be set.-ffast-math enables-fcx-limited-range.

This option implies:

  • -fno-honor-infinities

  • -fno-honor-nans

  • -fapprox-func

  • -fno-math-errno

  • -ffinite-math-only

  • -fassociative-math

  • -freciprocal-math

  • -fno-signed-zeros

  • -fno-trapping-math

  • -fno-rounding-math

  • -ffp-contract=fast

Note:-ffast-math causescrtfastmath.o to be linked with code unless-shared or-mno-daz-ftz is present. SeeA note about crtfastmath.o for more details.

-fno-fast-math

Disable fast-math mode. This options disables unsafe floating-pointoptimizations by preventing the compiler from making any transformations thatcould affect the results.

This option implies:

  • -fhonor-infinities

  • -fhonor-nans

  • -fno-approx-func

  • -fno-finite-math-only

  • -fno-associative-math

  • -fno-reciprocal-math

  • -fsigned-zeros

  • -ffp-contract=on

Also, this option resets following options to their target-dependent defaults.

  • -f[no-]math-errno

There is ambiguity about how-ffp-contract,-ffast-math,and-fno-fast-math behave when combined. To keep the value of-ffp-contract consistent, we define this set of rules:

  • -ffast-math setsffp-contract tofast.

  • -fno-fast-math sets-ffp-contract toon (fast for CUDA andHIP).

  • If-ffast-math and-ffp-contract are both seen, but-ffast-math is not followed by-fno-fast-math,ffp-contractwill be given the value of whichever option was last seen.

  • If-fno-fast-math is seen and-ffp-contract has been seen at leastonce, theffp-contract will get the value of the last seen value of-ffp-contract.

  • If-fno-fast-math is seen and-ffp-contract has not been seen, the-ffp-contract setting is determined by the default value of-ffp-contract.

Note:-fno-fast-math causescrtfastmath.o to not be linked with codeunless-mdaz-ftz is present.

-fdenormal-fp-math=<value>

Select which denormal numbers the code is permitted to require.

Valid values are:

  • ieee - IEEE 754 denormal numbers

  • preserve-sign - the sign of a flushed-to-zero number is preserved in the sign of 0

  • positive-zero - denormals are flushed to positive zero

The default value depends on the target. For most targets, defaults toieee.

-f[no-]strict-float-cast-overflow

When a floating-point value is not representable in a destination integertype, the code has undefined behavior according to the language standard.By default, Clang will not guarantee any particular result in that case.With the ‘no-strict’ option, Clang will saturate towards the smallest andlargest representable integer values instead. NaNs will be converted to zero.Defaults to-fstrict-float-cast-overflow.

-f[no-]math-errno

Require math functions to indicate errors by setting errno.The default varies by ToolChain.-fno-math-errno allows optimizationsthat might cause standard C math functions to not seterrno.For example, on some systems, the math functionsqrt is specifiedas settingerrno toEDOM when the input is negative. On thesesystems, the compiler cannot normally optimize a call tosqrt to useinline code (e.g. the x86sqrtsd instruction) without additionalchecking to ensure thaterrno is set appropriately.-fno-math-errno permits these transformations.

On some targets, math library functions never seterrno, and so-fno-math-errno is the default. This includes most BSD-derivedsystems, including Darwin.

-f[no-]trapping-math

Control floating point exception behavior.-fno-trapping-math allows optimizations that assume that floating point operations cannot generate traps such as divide-by-zero, overflow and underflow.

  • The option-ftrapping-math behaves identically to-ffp-exception-behavior=strict.

  • The option-fno-trapping-math behaves identically to-ffp-exception-behavior=ignore. This is the default.

-ffp-contract=<value>

Specify when the compiler is permitted to form fused floating-pointoperations, such as fused multiply-add (FMA). Fused operations arepermitted to produce more precise results than performing the sameoperations separately.

The C and C++ standards permit intermediate floating-point results within anexpression to be computed with more precision than their type wouldnormally allow. This permits operation fusing, and Clang takes advantageof this by default (on). Fusion across statements is not compliant withthe C and C++ standards but can be enabled using-ffp-contract=fast.

Fusion can be controlled with theFP_CONTRACT andclangfpcontractpragmas. Please note that pragmas will be ingored with-ffp-contract=fast, and refer to the pragma documentation for adescription of how the pragmas interact with the different-ffp-contractoption values.

Valid values are:

  • fast: enable fusion across statements disregarding pragmas, breakingcompliance with the C and C++ standards (default for CUDA).

  • on: enable C and C++ standard complaint fusion in the same statementunless dictated by pragmas (default for languages other than CUDA/HIP)

  • off: disable fusion

  • fast-honor-pragmas: fuse across statements unless dictated by pragmas(default for HIP)

-f[no-]honor-infinities

Allow floating-point optimizations that assume arguments and results arenot +-Inf.Defaults to-fhonor-infinities.

If both-fno-honor-infinities and-fno-honor-nans are used,has the same effect as specifying-ffinite-math-only.

-f[no-]honor-nans

Allow floating-point optimizations that assume arguments and results arenot NaNs.Defaults to-fhonor-nans.

If both-fno-honor-infinities and-fno-honor-nans are used,has the same effect as specifying-ffinite-math-only.

-f[no-]approx-func

Allow certain math function calls (such aslog,sqrt,pow, etc)to be replaced with an approximately equivalent set of instructionsor alternative math function calls. For example, apow(x,0.25)may be replaced withsqrt(sqrt(x)), despite being an inexact resultin cases wherex is-0.0 or-inf.Defaults to-fno-approx-func.

-f[no-]signed-zeros

Allow optimizations that ignore the sign of floating point zeros.Defaults to-fsigned-zeros.

-f[no-]associative-math

Allow floating point operations to be reassociated.Defaults to-fno-associative-math.

-f[no-]reciprocal-math

Allow division operations to be transformed into multiplication by areciprocal. This can be significantly faster than an ordinary divisionbut can also have significantly less precision. Defaults to-fno-reciprocal-math.

-f[no-]unsafe-math-optimizations

Allow unsafe floating-point optimizations.-funsafe-math-optimizations also implies:

  • -fapprox-func

  • -fassociative-math

  • -freciprocal-math

  • -fno-signed-zeros

  • -fno-trapping-math

  • -ffp-contract=fast

-fno-unsafe-math-optimizations implies:

  • -fno-approx-func

  • -fno-associative-math

  • -fno-reciprocal-math

  • -fsigned-zeros

  • -ffp-contract=on

There is ambiguity about how-ffp-contract,-funsafe-math-optimizations, and-fno-unsafe-math-optimizationsbehave when combined. Explanation in-fno-fast-math also appliesto these options.

Defaults to-fno-unsafe-math-optimizations.

-f[no-]finite-math-only

Allow floating-point optimizations that assume arguments and results arenot NaNs or +-Inf.-ffinite-math-only defines the__FINITE_MATH_ONLY__ preprocessor macro.-ffinite-math-only implies:

  • -fno-honor-infinities

  • -fno-honor-nans

-ffno-inite-math-only implies:

  • -fhonor-infinities

  • -fhonor-nans

Defaults to-fno-finite-math-only.

-f[no-]rounding-math

Force floating-point operations to honor the dynamically-set rounding mode by default.

The result of a floating-point operation often cannot be exactly represented in the result type and therefore must be rounded. IEEE 754 describes different rounding modes that control how to perform this rounding, not all of which are supported by all implementations. C provides interfaces (fesetround andfesetenv) for dynamically controlling the rounding mode, and while it also recommends certain conventions for changing the rounding mode, these conventions are not typically enforced in the ABI. Since the rounding mode changes the numerical result of operations, the compiler must understand something about it in order to optimize floating point operations.

Note that floating-point operations performed as part of constant initialization are formally performed prior to the start of the program and are therefore not subject to the current rounding mode. This includes the initialization of global variables and localstatic variables. Floating-point operations in these contexts will be rounded usingFE_TONEAREST.

  • The option-fno-rounding-math allows the compiler to assume that the rounding mode is set toFE_TONEAREST. This is the default.

  • The option-frounding-math forces the compiler to honor the dynamically-set rounding mode. This prevents optimizations which might affect results if the rounding mode changes or is different from the default; for example, it prevents floating-point operations from being reordered across most calls and prevents constant-folding when the result is not exactly representable.

-ffp-model=<value>

Specify floating point behavior.-ffp-model is an umbrellaoption that encompasses functionality provided by other, singlepurpose, floating point options. Valid values are:precise,strict,fast, andaggressive.Details:

  • precise Disables optimizations that are not value-safe onfloating-point data, although FP contraction (FMA) is enabled(-ffp-contract=on). This is the default behavior. This value resets-fmath-errno to its target-dependent default.

  • strict Enables-frounding-math and-ffp-exception-behavior=strict, and disables contractions (FMA). Allof the-ffast-math enablements are disabled. EnablesSTDCFENV_ACCESS: by defaultFENV_ACCESS is disabled. This optionsetting behaves as though#pragmaSTDCFENV_ACCESSON appeared at thetop of the source file.

  • fast Behaves identically to specifying-funsafe-math-optimizations,-fno-math-errno and-fcomplex-arithmetic=promotedffp-contract=fast

  • aggressive Behaves identically to specifying both-ffast-math andffp-contract=fast

Note: If your command line specifies multiple instancesof the-ffp-model option, or if your command line option specifies-ffp-model and later on the command line selects a floating pointoption that has the effect of negating part of theffp-model thathas been selected, then the compiler will issue a diagnostic warningthat the override has occurred.

-ffp-exception-behavior=<value>

Specify the floating-point exception behavior.

Valid values are:ignore,maytrap, andstrict.The default value isignore. Details:

  • ignore The compiler assumes that the exception status flags will not be read and that floating point exceptions will be masked.

  • maytrap The compiler avoids transformations that may raise exceptions that would not have been raised by the original code. Constant folding performed by the compiler is exempt from this option.

  • strict The compiler ensures that all transformations strictly preserve the floating point exception semantics of the original code.

-ffp-eval-method=<value>

Specify the floating-point evaluation method for intermediate results withina single expression of the code.

Valid values are:source,double, andextended.For 64-bit targets, the default value issource. For 32-bit x86 targetshowever, in the case of NETBSD 6.99.26 and under, the default value isdouble; in the case of NETBSD greater than 6.99.26, with NoSSE, thedefault value isextended, with SSE the default value issource.Details:

  • source The compiler uses the floating-point type declared in the source program as the evaluation method.

  • double The compiler usesdouble as the floating-point evaluation method for all float expressions of type that is narrower thandouble.

  • extended The compiler useslongdouble as the floating-point evaluation method for all float expressions of type that is narrower thanlongdouble.

-f[no-]protect-parens

This option pertains to floating-point types, complex types withfloating-point components, and vectors of these types. Some arithmeticexpression transformations that are mathematically correct and permissibleaccording to the C and C++ language standards may be incorrect when dealingwith floating-point types, such as reassociation and distribution. Further,the optimizer may ignore parentheses when computing arithmetic expressionsin circumstances where the parenthesized and unparenthesized expressionexpress the same mathematical value. For example (a+b)+c is the samemathematical value as a+(b+c), but the optimizer is free to evaluate theadditions in any order regardless of the parentheses. When enabled, thisoption forces the optimizer to honor the order of operations with respectto parentheses in all circumstances.Defaults to-fno-protect-parens.

Note that floating-point contraction (option-ffp-contract=) is disabledwhen-fprotect-parens is enabled. Also note that in safe floating-pointmodes, such as-ffp-model=precise or-ffp-model=strict, this optionhas no effect because the optimizer is prohibited from making unsafetransformations.

-fexcess-precision:

The C and C++ standards allow floating-point expressions to be computed as ifintermediate results had more precision (and/or a wider range) than the typeof the expression strictly allows. This is called excess precisionarithmetic.Excess precision arithmetic can improve the accuracy of results (although notalways), and it can make computation significantly faster if the target lacksdirect hardware support for arithmetic in a particular type. However, it canalso undermine strict floating-point reproducibility.

Under the standards, assignments and explicit casts force the operand to beconverted to its formal type, discarding any excess precision. Because datacan only flow between statements via an assignment, this means that the useof excess precision arithmetic is a reliable local property of a singlestatement, and results do not change based on optimization. However, whenexcess precision arithmetic is in use, Clang does not guarantee strictreproducibility, and future compiler releases may recognize moreopportunities to use excess precision arithmetic, e.g. with floating-pointbuiltins.

Clang does not use excess precision arithmetic for most types or on mosttargets. For example, even on pre-SSE X86 targets wherefloat anddouble computations must be performed in the 80-bit X87 format, Clangrounds all intermediate results correctly for their type. Clang currentlyuses excess precision arithmetic by default only for the following types andtargets:

  • _Float16 on X86 targets withoutAVX512-FP16.

The-fexcess-precision=<value> option can be used to control the use ofexcess precision arithmetic. Valid values are:

  • standard - The default. Allow the use of excess precision arithmeticunder the constraints of the C and C++ standards. Has no effect except onthe types and targets listed above.

  • fast - Accepted for GCC compatibility, but currently treated as analias forstandard.

  • 16 - Forces_Float16 operations to be emitted without using excessprecision arithmetic.

-fcomplex-arithmetic=<value>:

This option specifies the implementation for complex multiplication and division.

Valid values are:basic,improved,full andpromoted.

  • basic Implementation of complex division and multiplication usingalgebraic formulas at source precision. No special handling to avoidoverflow. NaN and infinite values are not handled.

  • improved Implementation of complex division using the Smith algorithmat source precision. Smith’s algorithm for complex division.See SMITH, R. L. Algorithm 116: Complex division. Commun. ACM 5, 8 (1962).This value offers improved handling for overflow in intermediatecalculations, but overflow may occur. NaN and infinite values are nothandled in some cases.

  • full Implementation of complex division and multiplication using acall to runtime library functions (generally the case, but the BE mightsometimes replace the library call if it knows enough about the potentialrange of the inputs). Overflow and non-finite values are handled by thelibrary implementation. For the case of multiplication overflow will occur inaccordance with normal floating-point rules. This is the default value.

  • promoted Implementation of complex division using algebraic formulas athigher precision. Overflow is handled. Non-finite values are handled in somecases. If the target does not have native support for a higher precisiondata type, the implementation for the complex operation using the Smithalgorithm will be used. Overflow may still occur in some cases. NaN andinfinite values are not handled.

-fcx-limited-range:

This option is aliased to-fcomplex-arithmetic=basic. It enables thenaive mathematical formulas for complex division and multiplication with noNaN checking of results. The default is-fno-cx-limited-range aliased to-fcomplex-arithmetic=full. This option is enabled by the-ffast-mathoption.

-fcx-fortran-rules:

This option is aliased to-fcomplex-arithmetic=improved. It enables thenaive mathematical formulas for complex multiplication and enables applicationof Smith’s algorithm for complex division. See SMITH, R. L. Algorithm 116:Complex division. Commun. ACM 5, 8 (1962).The default is-fno-cx-fortran-rules aliased to-fcomplex-arithmetic=full.

Accessing the floating point environment

Many targets allow floating point operations to be configured to control thingssuch as how inexact results should be rounded and how exceptional conditionsshould be handled. This configuration is called the floating point environment.C and C++ restrict access to the floating point environment by default, and thecompiler is allowed to assume that all operations are performed in the defaultenvironment. When code is compiled in this default mode, operations that dependon the environment (such as floating-point arithmetic andFLT_ROUNDS) may haveundefined behavior if the dynamic environment is not the default environment; forexample,FLT_ROUNDS may or may not simply return its default value for the targetinstead of reading the dynamic environment, and floating-point operations may beoptimized as if the dynamic environment were the default. Similarly, it is undefinedbehavior to change the floating point environment in this default mode, for exampleby calling thefesetround function.C provides two pragmas to allow code to dynamically modify the floating point environment:

  • #pragmaSTDCFENV_ACCESSON allows dynamic changes to the entire floatingpoint environment.

  • #pragmaSTDCFENV_ROUNDFE_DYNAMIC allows dynamic changes to just the floatingpoint rounding mode. This may be more optimizable thanFENV_ACCESSON becausethe compiler can still ignore the possibility of floating-point exceptions by default.

Both of these can be used either at the start of a block scope, in which casethey cover all code in that scope (unless they’re turned off in a child scope),or at the top level in a file, in which case they cover all subsequent functionbodies until they’re turned off. Note that it is undefined behavior to entercode that isnot covered by one of these pragmas from code thatis coveredby one of these pragmas unless the floating point environment has been restoredto its default state. See the C standard for more information about these pragmas.

The command line option-frounding-math behaves as if the translation unitbegan with#pragmaSTDCFENV_ROUNDFE_DYNAMIC. The command line option-ffp-model=strict behaves as if the translation unit began with#pragmaSTDCFENV_ACCESSON.

Code that just wants to use a specific rounding mode for specific floating pointoperations can avoid most of the hazards of the dynamic floating point environmentby using#pragmaSTDCFENV_ROUND with a value other thanFE_DYNAMIC.

A note aboutcrtfastmath.o

-ffast-math and-funsafe-math-optimizations without the-sharedoption causecrtfastmath.o to beautomatically linked, which adds a static constructor that sets the FTZ/DAZbits in MXCSR, affecting not only the current compilation unit but all staticand shared libraries included in the program. This decision can be overriddenby using either the flag-mdaz-ftz or-mno-daz-ftz to respectivelylink or not linkcrtfastmath.o.

A note about__FLT_EVAL_METHOD__

The__FLT_EVAL_METHOD__ is not defined as a traditional macro, and so itwill not appear when dumping preprocessor macros. Instead, the value__FLT_EVAL_METHOD__ expands to is determined at the point of expansioneither from the value set by the-ffp-eval-method command line option orfrom the target. This is because the__FLT_EVAL_METHOD__ macrocannot expand to the correct evaluation method in the presence of a#pragmawhich alters the evaluation method. An error is issued if__FLT_EVAL_METHOD__ is expanded inside a scope modified by#pragmaclangfpeval_method.

A note about Floating Point Constant Evaluation

In C, the only place floating point operations are guaranteed to be evaluatedduring translation is in the initializers of variables of static storageduration, which are all notionally initialized before the program beginsexecuting (and thus before a non-default floating point environment can beentered). But C++ has many more contexts where floating point constantevaluation occurs. Specifically: for static/thread-local variables,first try evaluating the initializer in a constant context, including in theconstant floating point environment (just like in C), and then, if that fails,fall back to emitting runtime code to perform the initialization (which mightin general be in a different floating point environment).

Consider this example when compiled with-frounding-math

constexpr float func_01(float x, float y) {  return x + y;}float V1 = func_01(1.0F, 0x0.000001p0F);

The C++ rule is that initializers for static storage duration variables arefirst evaluated during translation (therefore, in the default rounding mode),and only evaluated at runtime (and therefore in the runtime rounding mode) ifthe compile-time evaluation fails. This is in line with the C rules;C11 F.8.5 says:All computation for automatic initialization is done (as if)at execution time; thus, it is affected by any operative modes and raisesfloating-point exceptions as required by IEC 60559 (provided the state for theFENV_ACCESS pragma is ‘‘on’’). All computation for initialization of objectsthat have static or thread storage duration is done (as if) at translationtime. C++ generalizes this by adding another phase of initialization(at runtime) if the translation-time initialization fails, but thetranslation-time evaluation of the initializer of succeeds, it will betreated as a constant initializer.

Controlling Code Generation

Clang provides a number of ways to control code generation. The optionsare listed below.

-f[no-]sanitize=check1,check2,...

Turn on runtime checks for various forms of undefined or suspiciousbehavior.

This option controls whether Clang adds runtime checks for variousforms of undefined or suspicious behavior, and is disabled bydefault. If a check fails, a diagnostic message is produced atruntime explaining the problem. The main checks are:

There are more fine-grained checks available: seethelist of specific kinds ofundefined behavior that can be detected and thelistof control flow integrity schemes.

The-fsanitize= argument must also be provided when linking, inorder to link to the appropriate runtime library.

It is not possible to combine more than one of the-fsanitize=address,-fsanitize=thread, and-fsanitize=memory checkers in the sameprogram.

-f[no-]sanitize-recover=check1,check2,...
-f[no-]sanitize-recover[=all]

Controls which checks enabled by-fsanitize= flag are non-fatal.If the check is fatal, program will halt after the first errorof this kind is detected and error report is printed.

By default, non-fatal checks are those enabled byUndefinedBehaviorSanitizer,except for-fsanitize=return and-fsanitize=unreachable. Somesanitizers may not support recovery (or not support it by defaulte.g.AddressSanitizer), and always crash the program after the issueis detected.

Note that the-fsanitize-trap flag has precedence over this flag.This means that if a check has been configured to trap elsewhere on thecommand line, or if the check traps by default, this flag will not haveany effect unless that sanitizer’s trapping behavior is disabled with-fno-sanitize-trap.

For example, if a command line contains the flags-fsanitize=undefined-fsanitize-trap=undefined, the flag-fsanitize-recover=alignmentwill have no effect on its own; it will need to be accompanied by-fno-sanitize-trap=alignment.

-f[no-]sanitize-trap=check1,check2,...
-f[no-]sanitize-trap[=all]

Controls which checks enabled by the-fsanitize= flag trap. Thisoption is intended for use in cases where the sanitizer runtime cannotbe used (for instance, when building libc or a kernel module), or wherethe binary size increase caused by the sanitizer runtime is a concern.

This flag is only compatible withcontrol flow integrity schemes andUndefinedBehaviorSanitizerchecks other thanvptr.

This flag is enabled by default for sanitizers in thecfi group.

-fsanitize-ignorelist=/path/to/ignorelist/file

Disable or modify sanitizer checks for objects (source files, functions,variables, types) listed in the file. SeeSanitizer special case list for file format description.

-fno-sanitize-ignorelist

Don’t use ignorelist file, if it was specified earlier in the command line.

-f[no-]sanitize-coverage=[type,features,...]

Enable simple code coverage in addition to certain sanitizers.SeeSanitizerCoverage for more details.

-f[no-]sanitize-address-outline-instrumentation

Controls how address sanitizer code is generated. If enabled will always usea function call instead of inlining the code. Turning this option on couldreduce the binary size, but might result in a worse run-time performance.

See :doc:AddressSanitizer for more details.

-f[no-]sanitize-stats

Enable simple statistics gathering for the enabled sanitizers.SeeSanitizerStats for more details.

-fsanitize-undefined-trap-on-error

Deprecated alias for-fsanitize-trap=undefined.

-fsanitize-cfi-cross-dso

Enable cross-DSO control flow integrity checks. This flag modifiesthe behavior of sanitizers in thecfi group to allow checkingof cross-DSO virtual and indirect calls.

-fsanitize-cfi-icall-generalize-pointers

Generalize pointers in return and argument types in function type signatureschecked by Control Flow Integrity indirect call checking. SeeControl Flow Integrity for more details.

-fsanitize-cfi-icall-experimental-normalize-integers

Normalize integers in return and argument types in function type signatureschecked by Control Flow Integrity indirect call checking. SeeControl Flow Integrity for more details.

This option is currently experimental.

-fsanitize-kcfi-arity

Extends kernel indirect call forward-edge control flow integrity withadditional function arity information (for supported targets). SeeControl Flow Integrity for more details.

-fstrict-vtable-pointers

Enable optimizations based on the strict rules for overwriting polymorphicC++ objects, i.e. the vptr is invariant during an object’s lifetime.This enables better devirtualization. Turned off by default, because it isstill experimental.

-fwhole-program-vtables

Enable whole-program vtable optimizations, such as single-implementationdevirtualization and virtual constant propagation, for classes withhidden LTO visibility. Requires-flto.

-f[no]split-lto-unit

Controls splitting theLTO unit into regular LTO andThinLTO portions, when compiling with -flto=thin. Defaults to falseunless-fsanitize=cfi or-fwhole-program-vtables are specified, inwhich case it defaults to true. Splitting is required withfsanitize=cfi,and it is an error to disable via-fno-split-lto-unit. Splitting isoptional with-fwhole-program-vtables, however, it enables moreaggressive whole program vtable optimizations (specifically virtual constantpropagation).

When enabled, vtable definitions and select virtual functions are placedin the split regular LTO module, enabling more aggressive whole programvtable optimizations required for CFI and virtual constant propagation.However, this can increase the LTO link time and memory requirements overpure ThinLTO, as all split regular LTO modules are merged and LTO linkedwith regular LTO.

-f[no-]unique-source-file-names

When enabled, allows the compiler to assume that each object filepassed to the linker has a unique identifier. The identifier foran object file is either the source file path or the value of theargument-funique-source-file-identifier if specified. This isuseful for reducing link times when doing ThinLTO in combination withwhole-program devirtualization or CFI.

The full source path or identifier passed to the compiler must beunique. This means that, for example, the following is a usage error:

$cdfoo$clang-funique-source-file-names-cfoo.c$cd../bar$clang-funique-source-file-names-cfoo.c$cd..$clangfoo/foo.obar/foo.o

but this is not:

$clang-funique-source-file-names-cfoo/foo.c$clang-funique-source-file-names-cbar/foo.c$clangfoo/foo.obar/foo.o

A misuse of this flag may result in a duplicate symbol error atlink time.

-funique-source-file-identifier=IDENTIFIER

Used with-funique-source-file-names to specify a source fileidentifier.

-fforce-emit-vtables

In order to improve devirtualization, forces emitting of vtables even inmodules where it isn’t necessary. It causes more inline virtual functionsto be emitted.

-fno-assume-sane-operator-new

Don’t assume that the C++’s new operator is sane.

This option tells the compiler to do not assume that C++’s globalnew operator will always return a pointer that does not alias anyother pointer when the function returns.

-fassume-nothrow-exception-dtor

Assume that an exception object’ destructor will not throw, and generateless code for catch handlers. A throw expression of a type with apotentially-throwing destructor will lead to an error.

By default, Clang assumes that the exception object may have a throwingdestructor. For the Itanium C++ ABI, Clang generates a landing pad todestroy local variables and call_Unwind_Resume for the codecatch(...){...}. This option tells Clang that an exception object’sdestructor will not throw and code simplification is possible.

-ftrap-function=[name]

Instruct code generator to emit a function call to the specifiedfunction name for__builtin_trap().

LLVM code generator translates__builtin_trap() to a trapinstruction if it is supported by the target ISA. Otherwise, thebuiltin is translated into a call toabort. If this option isset, then the code generator will always lower the builtin to a callto the specified function regardless of whether the target ISA has atrap instruction. This option is useful for environments (e.g.deeply embedded) where a trap cannot be properly handled, or whensome custom behavior is desired.

-ftls-model=[model]

Select which TLS model to use.

Valid values are:global-dynamic,local-dynamic,initial-exec andlocal-exec. The default value isglobal-dynamic. The compiler may use a different model if theselected model is not supported by the target, or if a moreefficient model can be used. The TLS model can be overridden pervariable using thetls_model attribute.

-femulated-tls

Select emulated TLS model, which overrides all -ftls-model choices.

In emulated TLS mode, all access to TLS variables are converted tocalls to __emutls_get_address in the runtime library.

-mhwdiv=[values]

Select the ARM modes (arm or thumb) that support hardware divisioninstructions.

Valid values are:arm,thumb andarm,thumb.This option is used to indicate which mode (arm or thumb) supportshardware division instructions. This only applies to the ARMarchitecture.

-m[no-]crc

Enable or disable CRC instructions.

This option is used to indicate whether CRC instructions are tobe generated. This only applies to the ARM architecture.

CRC instructions are enabled by default on ARMv8.

-mgeneral-regs-only

Generate code which only uses the general purpose registers.

This option restricts the generated code to use general registersonly. This only applies to the AArch64 architecture.

-mcompact-branches=[values]

Control the usage of compact branches for MIPSR6.

Valid values are:never,optimal andalways.The default value isoptimal which generates compact brancheswhen a delay slot cannot be filled.never disables the usage ofcompact branches andalways generates compact branches wheneverpossible.

-f[no-]max-type-align=[number]

Instruct the code generator to not enforce a higher alignment than the givennumber (of bytes) when accessing memory via an opaque pointer or reference.This cap is ignored when directly accessing a variable or when the pointeetype has an explicit “aligned” attribute.

The value should usually be determined by the properties of the system allocator.Some builtin types, especially vector types, have very high natural alignments;when working with values of those types, Clang usually wants to use instructionsthat take advantage of that alignment. However, many system allocators donot promise to return memory that is more than 8-byte or 16-byte-aligned. Usethis option to limit the alignment that the compiler can assume for an arbitrarypointer, which may point onto the heap.

This option does not affect the ABI alignment of types; the layout of structs andunions and the value returned by the alignof operator remain the same.

This option can be overridden on a case-by-case basis by putting an explicit“aligned” alignment on a struct, union, or typedef. For example:

#include<immintrin.h>// Make an aligned typedef of the AVX-512 16-int vector type.typedef __v16si __aligned_v16si __attribute__((aligned(64)));void initialize_vector(__aligned_v16si *v) {  // The compiler may assume that ‘v’ is 64-byte aligned, regardless of the  // value of -fmax-type-align.}
-faddrsig,-fno-addrsig

Controls whether Clang emits an address-significance table into the objectfile. Address-significance tables allow linkers to implementsafe ICF without the falsepositives that can result from other implementation techniques such asrelocation scanning. Address-significance tables are enabled by defaulton ELF targets when using the integrated assembler. This flag currentlyonly has an effect on ELF targets.

-f[no]-unique-internal-linkage-names

Controls whether Clang emits a unique (best-effort) symbol name for internallinkage symbols. When this option is set, compiler hashes the main sourcefile path from the command line and appends it to all internal symbols. If aprogram contains multiple objects compiled with the same command-line sourcefile path, the symbols are not guaranteed to be unique. This option isparticularly useful in attributing profile information to the correctfunction when multiple functions with the same private linkage name existin the binary.

It should be noted that this option cannot guarantee uniqueness and thefollowing is an example where it is not unique when two modules containsymbols with the same private linkage name:

$cd$P/foo&&clang-c-funique-internal-linkage-namesname_conflict.c$cd$P/bar&&clang-c-funique-internal-linkage-namesname_conflict.c$cd$P&&clangfoo/name_conflict.o&&bar/name_conflict.o
-f[no]-basic-block-address-map:
Emitsa``SHT_LLVM_BB_ADDR_MAP``sectionwhichincludesaddressoffsetsforeach
basicblockintheprogram,relativetotheparentfunctionaddress.
-fbasic-block-sections=[all,list=<arg>,none]

Controls how Clang emits text sections for basic blocks. With valuesallandlist=<arg>, each basic block or a subset of basic blocks can be placedin its own unique section.

With thelist=<arg> option, a file containing the subset of basic blocksthat need to placed in unique sections can be specified. The format of thefile is as follows. For example,list=spec.txt wherespec.txt is thefollowing:

!foo!!2!_Z3barv

will place the machine basic block withid2 in functionfoo in aunique section. It will also place all basic blocks of functionsbarin unique sections.

Further, section clusters can also be specified using thelist=<arg>option. For example,list=spec.txt wherespec.txt contains:

!foo!!1 !!3 !!5!!2 !!4 !!6

will create two unique sections for functionfoo with the firstcontaining the odd numbered basic blocks and the second containing theeven numbered basic blocks.

Basic block sections allow the linker to reorder basic blocks and enableslink-time optimizations like whole program inter-procedural basic blockreordering.

-fcodegen-data-generate[=<path>]

Emit the raw codegen (CG) data into custom sections in the object file.Currently, this option also combines the raw CG data from the object filesinto an indexed CG data file specified by the <path>, for LLD MachO only.When the <path> is not specified,default.cgdata is created.The CG data file combines all the outlining instances that occurred locallyin each object file.

$clang-fuse-ld=lld-Oz-fcodegen-data-generatecode.cc

For linkers that do not yet support this feature,llvm-cgdata can be usedmanually to merge this CG data in object files.

$clang-c-fuse-ld=lld-Oz-fcodegen-data-generatecode.cc$llvm-cgdata--merge-odefault.cgdatacode.o
-fcodegen-data-use[=<path>]

Read the codegen data from the specified path to more effectively outlinefunctions across compilation units. When the <path> is not specified,default.cgdata is used. This option can create many identically outlinedfunctions that can be optimized by the conventional linker’s identical codefolding (ICF).

$clang-fuse-ld=lld-Oz-Wl,--icf=safe-fcodegen-data-usecode.cc

Strict Aliasing

The C and C++ standards require accesses to objects in memory to use l-values ofan appropriate type for the object. This is calledstrict aliasing ortype-based alias analysis. Strict aliasing enhances a variety of powerfulmemory optimizations, including reordering, combining, and eliminating memoryaccesses. These optimizations can lead to unexpected behavior in code thatviolates the strict aliasing rules. For example:

voidadvance(size_t*index,double*data){doublevalue=data[*index];/* Clang may assume that this store does not change the contents of `data`. */*index+=1;/* Clang may assume that this store does not change the contents of `index`. */data[*index]=value;/* Either of these facts may create significant optimization opportunities     if Clang is able to inline this function. */}

Strict aliasing can be explicitly enabled with-fstrict-aliasing anddisabled with-fno-strict-aliasing.clang-cl defaults to-fno-strict-aliasing; see . Otherwise, Clang defaults to-fstrict-aliasing.

C and C++ specify slightly different rules for strict aliasing. To improvelanguage interoperability, Clang allows two types to alias if either languagewould permit it. This includes applying the C++ similar types rule to C,allowingint** to aliasintconst*const*. Clang also relaxes thestandard aliasing rules in the following ways:

  • All integer types of the same size are permitted to alias each other,including signed and unsigned types.

  • void* is permitted to alias any pointer type,void** is permitted toalias any pointer to pointer type, and so on.

Code which violates strict aliasing has undefined behavior. A program thatworks in one version of Clang may not work in another because of changes to theoptimizer. Clang provides aTypeSanitizer to help detectviolations of the strict aliasing rules, but it is currently still experimental.Code that is known to violate strict aliasing should generally be built with-fno-strict-aliasing if the violation cannot be fixed.

Clang supports several ways to fix a violation of strict aliasing:

  • L-values of the character typeschar andunsignedchar (as well asother types, depending on the standard) are permitted to access objects ofany type.

  • Library functions such asmemcpy andmemset are specified as treatingmemory as characters and therefore are not limited by strict aliasing. If avalue of one type must be reinterpreted as another (e.g. to read the bits of afloating-point number), usememcpy to copy the representation to an objectof the destination type. This has no overhead over a direct l-value accessbecause Clang should reliably optimize calls to these functions to use simpleloads and stores when they are used with small constant sizes.

  • The attributemay_alias can be added to atypedef to give l-values ofthat type the same aliasing power as the character types.

Clang makes a best effort to avoid obvious miscompilations from strict aliasingby only considering type information when it cannot prove that two accesses mustrefer to the same memory. However, it is not recommended that programmersintentionally rely on this instead of using one of the solutions above becauseit is too easy for the compiler’s analysis to be blocked in surprising ways.

In Clang 20, Clang strengthened its implementation of strict aliasing foraccesses of pointer type. Previously, all accesses of pointer type werepermitted to alias each other, but Clang now distinguishes different pointersby their pointee type, except as limited by the relaxations around qualifiersandvoid* described above. The previous behavior of treating all pointers asaliasing can be restored using-fno-pointer-tbaa.

Profile Guided Optimization

Profile information enables better optimization. For example, knowing that abranch is taken very frequently helps the compiler make better decisions whenordering basic blocks. Knowing that a functionfoo is called morefrequently than another functionbar helps the inliner. Optimizationlevels-O2 and above are recommended for use of profile guided optimization.

Clang supports profile guided optimization with two different kinds ofprofiling. A sampling profiler can generate a profile with very low runtimeoverhead, or you can build an instrumented version of the code that collectsmore detailed profile information. Both kinds of profiles can provide executioncounts for instructions in the code and information on branches taken andfunction invocation.

Regardless of which kind of profiling you use, be careful to collect profilesby running your code with inputs that are representative of the typicalbehavior. Code that is not exercised in the profile will be optimized as if itis unimportant, and the compiler may make poor optimization choices for codethat is disproportionately used while profiling.

Differences Between Sampling and Instrumentation

Although both techniques are used for similar purposes, there are importantdifferences between the two:

  1. Profile data generated with one cannot be used by the other, and there is noconversion tool that can convert one to the other. So, a profile generatedvia-fprofile-generate or-fprofile-instr-generate must be used with-fprofile-use or-fprofile-instr-use. Similarly, sampling profilesgenerated by external profilers must be converted and used with-fprofile-sample-useor-fauto-profile.

  2. Instrumentation profile data can be used for code coverage analysis andoptimization.

  3. Sampling profiles can only be used for optimization. They cannot be used forcode coverage analysis. Although it would be technically possible to usesampling profiles for code coverage, sample-based profiles are toocoarse-grained for code coverage purposes; it would yield poor results.

  4. Sampling profiles must be generated by an external tool. The profilegenerated by that tool must then be converted into a format that can be readby LLVM. The section on sampling profilers describes one of the supportedsampling profile formats.

Using Sampling Profilers

Sampling profilers are used to collect runtime information, such ashardware counters, while your application executes. They are typicallyvery efficient and do not incur a large runtime overhead. Thesample data collected by the profiler can be used during compilationto determine what the most executed areas of the code are.

Using the data from a sample profiler requires some changes in the waya program is built. Before the compiler can use profiling information,the code needs to execute under the profiler. The following is theusual build cycle when using sample profilers for optimization:

  1. Build the code with source line table information. You can use all theusual build flags that you always build your application with. The onlyrequirement is that DWARF debug info including source line information isgenerated. This DWARF information is important for the profiler to be ableto map instructions back to source line locations. The usefulness of thisDWARF information can be improved with the-fdebug-info-for-profilingand-funique-internal-linkage-names options.

    On Linux:

    $clang++-O2-gline-tables-only\-fdebug-info-for-profiling-funique-internal-linkage-names\code.cc-ocode

    While MSVC-style targets default to CodeView debug information, DWARF debuginformation is required to generate source-level LLVM profiles. Use-gdwarf to include DWARF debug information:

    > clang-cl /O2 -gdwarf -gline-tables-only^ /clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names^ code.cc /Fe:code /fuse-ld=lld /link /debug:dwarf

Note

-funique-internal-linkage-namesgenerates unique names based on given command-line source file paths. Ifyour build system uses absolute source paths and these paths may changebetween steps 1 and 4, then the uniqued function names may change and resultin unused profile data. Consider omitting this option in such cases.

  1. Run the executable under a sampling profiler. The specific profileryou use does not really matter, as long as its output can be convertedinto the format that the LLVM optimizer understands.

    Two such profilers are the Linux Perf profiler(https://perf.wiki.kernel.org/) and Intel’s Sampling Enabling Product (SEP),available as part ofIntel VTune.While Perf is Linux-specific, SEP can be used on Linux, Windows, and FreeBSD.

    The LLVM toolllvm-profgen can convert output of either Perf or SEP. Anexternal project,AutoFDO, alsoprovides acreate_llvm_prof tool which supports Linux Perf output.

    When using Perf:

    $perfrecord-b-eBR_INST_RETIRED.NEAR_TAKEN:uppp./code

    If the event above is unavailable,branches:u is probably next-best.

    Note the use of the-b flag. This tells Perf to use the Last BranchRecord (LBR) to record call chains. While this is not strictly required,it provides better call information, which improves the accuracy ofthe profile data.

    When using SEP:

    $sep-start-outcode.tb7-ecBR_INST_RETIRED.NEAR_TAKEN:precise=yes:pdir-lbrno_filter:usr-perf-scriptbrstack-app./code

    This produces acode.perf.data.script output which can be used withllvm-profgen’s--perfscript input option.

  2. Convert the collected profile data to LLVM’s sample profile format. This iscurrently supported via theAutoFDOconvertercreate_llvm_prof. Once built and installed, you can converttheperf.data file to LLVM using the command:

    $create_llvm_prof--binary=./code--out=code.prof

    This will readperf.data and the binary file./code and emitthe profile data incode.prof. Note that if you ranperfwithout the-b flag, you need to use--use_lbr=false whencallingcreate_llvm_prof.

    Alternatively, the LLVM toolllvm-profgen can also be used to generatethe LLVM sample profile:

    $llvm-profgen--binary=./code--output=code.prof--perfdata=perf.data

    Please note,perf.data must be collected with-b flag to Linuxperffor the above step to work.

    When using SEP the output is in the textual format corresponding tollvm-profgen--perfscript. For example:

    $llvm-profgen--binary=./code--output=code.prof--perfscript=code.perf.data.script
  3. Build the code again using the collected profile. This step feedsthe profile back to the optimizers. This should result in a binarythat executes faster than the original one. Note that you are notrequired to build the code with the exact same arguments that youused in the first step. The only requirement is that you build the codewith the same debug info options and-fprofile-sample-use.

    On Linux:

    $clang++-O2-gline-tables-only\-fdebug-info-for-profiling-funique-internal-linkage-names\-fprofile-sample-use=code.profcode.cc-ocode

    On Windows:

    > clang-cl /O2 -gdwarf -gline-tables-only^ /clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names^ -fprofile-sample-use=code.prof code.cc /Fe:code -fuse-ld=lld /link /debug:dwarf

    [OPTIONAL] Sampling-based profiles can have inaccuracies or missing block/edge counters. The profile inference algorithm (profi) can be used to infermissing blocks and edge counts, and improve the quality of profile data.Enable it with-fsample-profile-use-profi. For example, on Linux:

    $clang++-fsample-profile-use-profi-O2-gline-tables-only\-fdebug-info-for-profiling-funique-internal-linkage-names\-fprofile-sample-use=code.profcode.cc-ocode

    On Windows:

    > clang-cl /clang:-fsample-profile-use-profi /O2 -gdwarf -gline-tables-only^ /clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names^ -fprofile-sample-use=code.prof code.cc /Fe:code -fuse-ld=lld /link /debug:dwarf
Sample Profile Formats

Since external profilers generate profile data in a variety of custom formats,the data generated by the profiler must be converted into a format that can beread by the backend. LLVM supports three different sample profile formats:

  1. ASCII text. This is the easiest one to generate. The file is divided intosections, which correspond to each of the functions with profileinformation. The format is described below. It can also be generated fromthe binary or gcov formats using thellvm-profdata tool.

  2. Binary encoding. This uses a more efficient encoding that yields smallerprofile files. This is the format generated by thecreate_llvm_prof toolinhttps://github.com/google/autofdo.

  3. GCC encoding. This is based on the gcov format, which is accepted by GCC. Itis only interesting in environments where GCC and Clang co-exist. Thisencoding is only generated by thecreate_gcov tool inhttps://github.com/google/autofdo. It can be read by LLVM andllvm-profdata, but it cannot be generated by either.

If you are using Linux Perf to generate sampling profiles, you can use theconversion toolcreate_llvm_prof described in the previous section.Otherwise, you will need to write a conversion tool that converts yourprofiler’s native format into one of these three.

Sample Profile Text Format

This section describes the ASCII text format for sampling profiles. It is,arguably, the easiest one to generate. If you are interested in generating anyof the other two, consult theProfileData library in LLVM’s source tree(specifically,include/llvm/ProfileData/SampleProfReader.h).

function1:total_samples:total_head_samples offset1[.discriminator]: number_of_samples [fn1:num fn2:num ... ] offset2[.discriminator]: number_of_samples [fn3:num fn4:num ... ] ... offsetN[.discriminator]: number_of_samples [fn5:num fn6:num ... ] offsetA[.discriminator]: fnA:num_of_total_samples  offsetA1[.discriminator]: number_of_samples [fn7:num fn8:num ... ]  offsetA1[.discriminator]: number_of_samples [fn9:num fn10:num ... ]  offsetB[.discriminator]: fnB:num_of_total_samples   offsetB1[.discriminator]: number_of_samples [fn11:num fn12:num ... ]

This is a nested tree in which the indentation represents the nesting levelof the inline stack. There are no blank lines in the file. And the spacingwithin a single line is fixed. Additional spaces will result in an errorwhile reading the file.

Any line starting with the ‘#’ character is completely ignored.

Inlined calls are represented with indentation. The Inline stack is astack of source locations in which the top of the stack represents theleaf function, and the bottom of the stack represents the actualsymbol to which the instruction belongs.

Function names must be mangled in order for the profile loader tomatch them in the current translation unit. The two numbers in thefunction header specify how many total samples were accumulated in thefunction (first number), and the total number of samples accumulatedin the prologue of the function (second number). This head samplecount provides an indicator of how frequently the function is invoked.

There are two types of lines in the function body.

  • Sampled line represents the profile information of a source location.offsetN[.discriminator]:number_of_samples[fn5:numfn6:num...]

  • Callsite line represents the profile information of an inlined callsite.offsetA[.discriminator]:fnA:num_of_total_samples

Each sampled line may contain several items. Some are optional (markedbelow):

  1. Source line offset. This number represents the line numberin the function where the sample was collected. The line number isalways relative to the line where symbol of the function isdefined. So, if the function has its header at line 280, the offset13 is at line 293 in the file.

    Note that this offset should never be a negative number. This couldhappen in cases like macros. The debug machinery will register theline number at the point of macro expansion. So, if the macro wasexpanded in a line before the start of the function, the profileconverter should emit a 0 as the offset (this means that the optimizerswill not be able to associate a meaningful weight to the instructionsin the macro).

  2. [OPTIONAL] Discriminator. This is used if the sampled programwas compiled with DWARF discriminator support(http://wiki.dwarfstd.org/index.php?title=Path_Discriminators).DWARF discriminators are unsigned integer values that allow thecompiler to distinguish between multiple execution paths on thesame source line location.

    For example, consider the line of codeif(cond)foo();elsebar();.If the predicatecond is true 80% of the time, then the edgeinto functionfoo should be considered to be taken most of thetime. But both calls tofoo andbar are at the same sourceline, so a sample count at that line is not sufficient. Thecompiler needs to know which part of that line is taken morefrequently.

    This is what discriminators provide. In this case, the calls tofoo andbar will be at the same line, but will havedifferent discriminator values. This allows the compiler to correctlyset edge weights intofoo andbar.

  3. Number of samples. This is an integer quantity representing thenumber of samples collected by the profiler at this sourcelocation.

  4. [OPTIONAL] Potential call targets and samples. If present, thisline contains a call instruction. This models both direct andnumber of samples. For example,

    130: 7  foo:3  bar:2  baz:7

    The above means that at relative line offset 130 there is a callinstruction that calls one offoo(),bar() andbaz(),withbaz() being the relatively more frequently called target.

As an example, consider a program with the call chainmain->foo->bar.When built with optimizations enabled, the compiler may inline thecalls tobar andfoo insidemain. The generated profilecould then be something like this:

main:35504:01: _Z3foov:35504  2: _Z32bari:31977  1.1: 319772: 0

This profile indicates that there were a total of 35,504 samplescollected in main. All of those were at line 1 (the call tofoo).Of those, 31,977 were spent inside the body ofbar. The last lineof the profile (2:0) corresponds to line 2 insidemain. Nosamples were collected there.

Profiling with Instrumentation

Clang also supports profiling via instrumentation. This requires building aspecial instrumented version of the code and has some runtimeoverhead during the profiling, but it provides more detailed results than asampling profiler. It also provides reproducible results, at least to theextent that the code behaves consistently across runs.

Clang supports two types of instrumentation: frontend-based and IR-based.Frontend-based instrumentation can be enabled with the option-fprofile-instr-generate,and IR-based instrumentation can be enabled with the option-fprofile-generate.For best performance with PGO, IR-based instrumentation should be used. It hasthe benefits of lower instrumentation overhead, smaller raw profile size, andbetter runtime performance. Frontend-based instrumentation, on the other hand,has better source correlation, so it should be used with source line-basedcoverage testing.

The flag-fcs-profile-generate also instruments programs using the sameinstrumentation method as-fprofile-generate. However, it performs apost-inline late instrumentation and can produce context-sensitive profiles.

Here are the steps for using profile guided optimization withinstrumentation:

  1. Build an instrumented version of the code by compiling and linking with the-fprofile-generate or-fprofile-instr-generate option.

    $clang++-O2-fprofile-instr-generatecode.cc-ocode
  2. Run the instrumented executable with inputs that reflect the typical usage.By default, the profile data will be written to adefault.profraw filein the current directory. You can override that default by using option-fprofile-instr-generate= or by setting theLLVM_PROFILE_FILEenvironment variable to specify an alternate file. If non-default file nameis specified by both the environment variable and the command line option,the environment variable takes precedence. The file name pattern specifiedcan include different modifiers:%p,%h,%m,%b,%t, and%c.

    Any instance of%p in that file name will be replaced by the processID, so that you can easily distinguish the profile output from multipleruns.

    $LLVM_PROFILE_FILE="code-%p.profraw"./code

    The modifier%h can be used in scenarios where the same instrumentedbinary is run in multiple different host machines dumping profile datato a shared network based storage. The%h specifier will be substitutedwith the hostname so that profiles collected from different hosts do notclobber each other.

    While the use of%p specifier can reduce the likelihood for the profilesdumped from different processes to clobber each other, such clobbering can stillhappen because of thepid re-use by the OS. Another side-effect of using%p is that the storage requirement for raw profile data files is greatlyincreased. To avoid issues like this, the%m specifier can used in the profilename. When this specifier is used, the profiler runtime will substitute%mwith an integer identifier associated with the instrumented binary. Additionally,multiple raw profiles dumped from different processes that share a file system (can beon different hosts) will be automatically merged by the profiler runtime during thedumping. If the program links in multiple instrumented shared libraries, each librarywill dump the profile data into its own profile data file (with its integerid embedded in the profile name). Note that the merging enabled by%m is for rawprofile data generated by profiler runtime. The resulting merged “raw” profile datafile still needs to be converted to a different format expected by the compiler (see step 3 below).

    $LLVM_PROFILE_FILE="code-%m.profraw"./code

    Although rare, binary signatures used by the%m specifier can havecollisions. In this case, the%b specifier, which expands to the binaryID (build ID in ELF and COFF), can be added. To use it, the program should becompiled with the build ID linker option (--build-id for GNU ld or LLD,/build-id for lld-link on Windows). Linux, Windows and AIX are supported.

    Seethis sectionabout the%t, and%c modifiers.

  3. Combine profiles from multiple runs and convert the “raw” profile format tothe input expected by clang. Use themerge command of thellvm-profdata tool to do this.

    $llvm-profdatamerge-output=code.profdatacode-*.profraw

    Note that this step is necessary even when there is only one “raw” profile,since the merge operation also changes the file format.

  4. Build the code again using the-fprofile-use or-fprofile-instr-useoption to specify the collected profile data.

    $clang++-O2-fprofile-instr-use=code.profdatacode.cc-ocode

    You can repeat step 4 as often as you like without regenerating theprofile. As you make changes to your code, clang may no longer be able touse the profile data. It will warn you when this happens.

Note that-fprofile-use option is semantically equivalent toits GCC counterpart, itdoes not handle profile formats produced by GCC.Both-fprofile-use and-fprofile-instr-use accept profiles in theindexed format, regardeless whether it is produced by frontend or the IR pass.

-fprofile-generate[=<dirname>]

The-fprofile-generate and-fprofile-generate= flags will usean alternative instrumentation method for profile generation. Whengiven a directory name, it generates the profile filedefault_%m.profraw in the directory nameddirname if specified.Ifdirname does not exist, it will be created at runtime.%m specifierwill be substituted with a unique id documented in step 2 above. In other words,with-fprofile-generate[=<dirname>] option, the “raw” profile data automaticmerging is turned on by default, so there will no longer any risk of profileclobbering from different running processes. For example,

$clang++-O2-fprofile-generate=yyy/zzzcode.cc-ocode

Whencode is executed, the profile will be written to the fileyyy/zzz/default_xxxx.profraw.

To generate the profile data file with the compiler readable format, thellvm-profdata tool can be used with the profile directory as the input:

$llvm-profdatamerge-output=code.profdatayyy/zzz/

If the user wants to turn off the auto-merging feature, or simply override thethe profile dumping path specified at command line, the environment variableLLVM_PROFILE_FILE can still be used to overridethe directory and filename for the profile file at runtime.To override the path and filename at compile time, use-Xclang-fprofile-instrument-path=/path/to/file_pattern.profraw.

-fcs-profile-generate[=<dirname>]

The-fcs-profile-generate and-fcs-profile-generate= flags will usethe same instrumentation method, and generate the same profile as in the-fprofile-generate and-fprofile-generate= flags. The difference isthat the instrumentation is performed after inlining so that the resultedprofile has a better context sensitive information. They cannot be usedtogether with-fprofile-generate and-fprofile-generate= flags.They are typically used in conjunction with-fprofile-use flag.The profile generated by-fcs-profile-generate and-fprofile-generatecan be merged by llvm-profdata. A use example:

$clang++-O2-fprofile-generate=yyy/zzzcode.cc-ocode$./code$llvm-profdatamerge-output=code.profdatayyy/zzz/

The first few steps are the same as that in-fprofile-generatecompilation. Then perform a second round of instrumentation.

$clang++-O2-fprofile-use=code.profdata-fcs-profile-generate=sss/ttt\-ocs_code$./cs_code$llvm-profdatamerge-output=cs_code.profdatasss/tttcode.profdata

The resultedcs_code.prodata combinescode.profdata and the profilegenerated from binarycs_code. Profilecs_code.profata can be used by-fprofile-use compilation.

$clang++-O2-fprofile-use=cs_code.profdata

The above command will read both profiles to the compiler at the identicalpoint of instrumentations.

-fprofile-use[=<pathname>]

Without any other arguments,-fprofile-use behaves identically to-fprofile-instr-use. Otherwise, ifpathname is the full path to aprofile file, it reads from that file. Ifpathname is a directory name,it reads frompathname/default.profdata.

-fprofile-update[=<method>]

Unless-fsanitize=thread is specified, the default issingle, whichuses non-atomic increments. The counters can be inaccurate under threadcontention.atomic uses atomic increments which is accurate but hasoverhead.prefer-atomic will be transformed toatomic when supportedby the target, orsingle otherwise.

-fprofile-continuous

Enables the continuous instrumentation profiling where profile counter updatesare continuously synced to a file. This option sets any necessary modifiers(currently%c) in the default profile filename and passes any necessaryflags to the middle-end to support this mode. Value profiling is not supportedin continuous mode.

$clang++-O2-fprofile-generate-fprofile-continuouscode.cc-ocode

Running./code will collect the profile and write it to thedefault_xxxx.profraw file. However, if./code abruptly terminates ordoes not callexit(), in continuous mode the profile collected up to thepoint of termination will be available indefault_xxxx.profraw while inthe non-continuous mode, no profile file is generated.

-ftemporal-profile

Enables the temporal profiling extension for IRPGO to improve startup time byreducing.text section page faults. To do this, we instrument functiontimestamps to measure when each function is called for the first time and usethis data to generate a function order to improve startup.

The profile is generated as normal.

$clang++-O2-fprofile-generate-ftemporal-profilecode.cc-ocode$./code$llvm-profdatamerge-ocode.profdatayyy/zzz

Using the resulting profile, we can generate a function order to pass to thelinker via--symbol-ordering-file for ELF or-order_file for Mach-O.

$llvm-profdataordercode.profdata-ocode.orderfile$clang++-O2-Wl,--symbol-ordering-file=code.orderfilecode.cc-ocode

Or the profile can be passed to LLD directly.

$clang++-O2-fuse-ld=lld-Wl,--irpgo-profile=code.profdata,--bp-startup-sort=functioncode.cc-ocode

For more information, please read the RFC:https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068

Fine Tuning Profile Collection

The PGO infrastructure provides user program knobs to fine tune profilecollection. Specifically, the PGO runtime provides the following functionsthat can be used to control the regions in the program where profiles shouldbe collected.

  • void__llvm_profile_set_filename(constchar*Name): changes the name ofthe profile file toName.

  • void__llvm_profile_reset_counters(void): resets all counters to zero.

  • int__llvm_profile_dump(void): write the profile data to disk.

For example, the following pattern can be used to skip profiling programinitialization, profile two specific hot regions, and skip profiling programcleanup:

intmain(){initialize();// Reset all profile counters to 0 to omit profile collected during// initialize()'s execution.__llvm_profile_reset_counters();...hotregion1// Dump the profile for hot region 1.__llvm_profile_set_filename("region1.profraw");__llvm_profile_dump();// Reset counters before proceeding to hot region 2.__llvm_profile_reset_counters();...hotregion2// Dump the profile for hot region 2.__llvm_profile_set_filename("region2.profraw");__llvm_profile_dump();// Since the profile has been dumped, no further profile data// will be collected beyond the above __llvm_profile_dump().cleanup();return0;}

These APIs’ names can be introduced to user programs in two ways.They can be declared as weak symbols on platforms which supporttreating weak symbols asnull during linking. For example, the user canhave

__attribute__((weak))int__llvm_profile_dump(void);// Then later in the same source fileif(__llvm_profile_dump)if(__llvm_profile_dump()!=0){...}// The first if condition tests if the symbol is actually defined.// Profile dumping only happens if the symbol is defined. Hence,// the user program works correctly during normal (not profile-generate)// executions.

Alternatively, the user program can include the headerprofile/instr_prof_interface.h, which contains the API names. For example,

#include"profile/instr_prof_interface.h"// Then later in the same source fileif(__llvm_profile_dump()!=0){...}

The user code does not need to check if the API names are defined, becausethese names are automatically replaced by(0) or the equivalence of noopif theclang is not compiling for profile generation.

Such replacement can happen becauseclang adds one of two macros dependingon the-fprofile-generate and the-fprofile-use flags.

  • __LLVM_INSTR_PROFILE_GENERATE: defined when one of-fprofile[-instr]-generate/-fcs-profile-generate is in effect.

  • __LLVM_INSTR_PROFILE_USE: defined when one of-fprofile-use/-fprofile-instr-use is in effect.

The two macros can be used to provide more flexibility so a user programcan execute code specifically intended for profile generate or profile use.For example, a user program can have special logging during profile generate:

#if __LLVM_INSTR_PROFILE_GENERATEexpensive_logging_of_full_program_state();#endif

The logging is automatically excluded during a normal build of the program,hence it does not impact performance during a normal execution.

It is advised to use such fine tuning only in a program’s cold regions. The weaksymbols can introduce extra control flow (theif checks), while the macros(hence declarations they guard inprofile/instr_prof_interface.h)can change the control flow of the functions that use them between profilegeneration and profile use (which can lead to discarded counters in suchfunctions). Using these APIs in the program’s cold regions introduces lessoverhead and leads to more optimized code.

Disabling Instrumentation

In certain situations, it may be useful to disable profile generation or usefor specific files in a build, without affecting the main compilation flagsused for the other files in the project.

In these cases, you can use the flag-fno-profile-instr-generate (or-fno-profile-generate) to disable profile generation, and-fno-profile-instr-use (or-fno-profile-use) to disable profile use.

Note that these flags should appear after the corresponding profileflags to have an effect.

Note

When none of the translation units inside a binary is instrumented, in thecase of Fuchsia the profile runtime will not be linked into the binary andno profile will be produced, while on other platforms the profile runtimewill be linked and profile will be produced but there will not be anycounters.

Instrumenting only selected files or functions

Sometimes it’s useful to only instrument certain files or functions. Forexample in automated testing infrastructure, it may be desirable to onlyinstrument files or functions that were modified by a patch to reduce theoverhead of instrumenting a full system.

This can be done using the-fprofile-list option.

-fprofile-list=<pathname>

This option can be used to apply profile instrumentation only to selectedfiles or functions.pathname should point to a file in theSanitizer special case list format which selects which files andfunctions to instrument.

$clang++-O2-fprofile-instr-generate-fprofile-list=fun.listcode.cc-ocode

The option can be specified multiple times to pass multiple files.

$clang++-O2-fprofile-instr-generate-fcoverage-mapping-fprofile-list=fun.list-fprofile-list=code.listcode.cc-ocode

Supported sections are[clang],[llvm],[csllvm], and[sample-coldcov] representingclang PGO, IRPGO, CSIRPGO and sample PGO based cold function coverage, respectively. Supported prefixesarefunction andsource. Supported categories areallow,skip, andforbid.skip adds theskipprofile attribute whileforbid adds thenoprofile attribute to the appropriate function. Usedefault:<allow|skip|forbid> to specify the default category.

$catfun.list#Thefollowingcasesareforclanginstrumentation.[clang]#Wemightnotwanttoprofilefunctionsthatareinlinedinmanyplaces.function:inlinedLots=skip#Wewanttoforbidprofilingwhereitmightbedangerous.source:lib/unsafe/*.cc=forbid#Otherwiseweallowprofiling.default:allow
Older Prefixes

An older format is also supported, but it is only able to add thenoprofile attribute.To filter individual functions or entire source files usefun:<name> orsrc:<file> respectively. To exclude a function or a source file, use!fun:<name> or!src:<file> respectively. The format also supportswildcard expansion. The compiler generated functions are assumed to be locatedin the main source file. It is also possible to restrict the filter to aparticular instrumentation type by using a named section.

# all functions whose name starts with foo will be instrumented.fun:foo*# except for foo1 which will be excluded from instrumentation.!fun:foo1# every function in path/to/foo.cc will be instrumented.src:path/to/foo.cc# bar will be instrumented only when using backend instrumentation.# Recognized section names are clang, llvm and csllvm.[llvm]fun:bar

When the file contains only excludes, all files and functions except for theexcluded ones will be instrumented. Otherwise, only the files and functionsspecified will be instrumented.

Instrument function groups

Sometimes it is desirable to minimize the size overhead of instrumentedbinaries. One way to do this is to partition functions into groups and onlyinstrument functions in a specified group. This can be done using the-fprofile-function-groups and-fprofile-selected-function-group options.

-fprofile-function-groups=<N>,-fprofile-selected-function-group=<i>

The following uses 3 groups

$clang++-Oz-fprofile-generate=group_0/-fprofile-function-groups=3-fprofile-selected-function-group=0code.cc-ocode.0$clang++-Oz-fprofile-generate=group_1/-fprofile-function-groups=3-fprofile-selected-function-group=1code.cc-ocode.1$clang++-Oz-fprofile-generate=group_2/-fprofile-function-groups=3-fprofile-selected-function-group=2code.cc-ocode.2

After collecting raw profiles from the three binaries, they can be merged intoa single profile like normal.

$llvm-profdatamerge-output=code.profdatagroup_*/*.profraw

Profile remapping

When the program is compiled after a change that affects many symbol names,pre-existing profile data may no longer match the program. For example:

  • switching from libstdc++ to libc++ will result in the mangled names of allfunctions taking standard library types to change

  • renaming a widely-used type in C++ will result in the mangled names of allfunctions that have parameters involving that type to change

  • moving from a 32-bit compilation to a 64-bit compilation may change theunderlying type ofsize_t and similar types, resulting in changes tomanglings

Clang allows use of a profile remapping file to specify that such differencesin mangled names should be ignored when matching the profile data against theprogram.

-fprofile-remapping-file=<file>

Specifies a file containing profile remapping information, that will beused to match mangled names in the profile data to mangled names in theprogram.

The profile remapping file is a text file containing lines of the form

fragmentkind fragment1 fragment2

wherefragmentkind is one ofname,type, orencoding,indicating whether the following mangled name fragments are<name>s,<type>s, or<encoding>s,respectively.Blank lines and lines starting with# are ignored.

For convenience, built-in <substitution>s such asSt andSsare accepted as <name>s (even though they technically are not <name>s).

For example, to specify thatabsl::string_view andstd::string_viewshould be treated as equivalent when matching profile data, the followingremapping file could be used:

# absl::string_view is considered equivalent to std::string_viewtype N4absl11string_viewE St17basic_string_viewIcSt11char_traitsIcEE# std:: might be std::__1:: in libc++ or std::__cxx11:: in libstdc++name 3std St3__1name 3std St7__cxx11

Matching profile data using a profile remapping file is supported on abest-effort basis. For example, information regarding indirect call targets iscurrently not remapped. For best results, you are encouraged to generate newprofile data matching the updated program, or to remap the profile datausing thellvm-cxxmap andllvm-profdatamerge tools.

Note

Profile data remapping is currently only supported for C++ mangled namesfollowing the Itanium C++ ABI mangling scheme. This covers all C++ targetssupported by Clang other than Windows.

GCOV-based Profiling

GCOV is a test coverage program, it helps to know how often a line of codeis executed. When instrumenting the code with--coverage option, somecounters are added for each edge linking basic blocks.

At compile time, gcno files are generated containing information aboutblocks and edges between them. At runtime the counters are incremented and atexit the counters are dumped in gcda files.

The toolllvm-covgcov will parse gcno, gcda and source files to generatea report.c.gcov.

-fprofile-filter-files=[regexes]

Define a list of regexes separated by a semi-colon.If a file name matches any of the regexes then the file is instrumented.

$clang--coverage-fprofile-filter-files=".*\.c$"foo.c

For example, this will only instrument files finishing with.c, skipping.h files.

-fprofile-exclude-files=[regexes]

Define a list of regexes separated by a semi-colon.If a file name doesn’t match all the regexes then the file is instrumented.

$clang--coverage-fprofile-exclude-files="^/usr/include/.*$"foo.c

For example, this will instrument all the files except the ones in/usr/include.

If both options are used then a file is instrumented if its name matches anyof the regexes from-fprofile-filter-list and doesn’t match all the regexesfrom-fprofile-exclude-list.

$clang--coverage-fprofile-exclude-files="^/usr/include/.*$"\-fprofile-filter-files="^/usr/.*$"

In that case/usr/foo/oof.h is instrumented since it matches the filter regex anddoesn’t match the exclude regex, but/usr/include/foo.h doesn’t since it matchesthe exclude regex.

Controlling Debug Information

Controlling Size of Debug Information

Debug info kind generated by Clang can be set by one of the flags listedbelow. If multiple flags are present, the last one is used.

-g0

Don’t generate any debug info (default).

-gline-tables-only

Generate line number tables only.

This kind of debug info allows to obtain stack traces with function names,file names and line numbers (by such tools asgdb oraddr2line). Itdoesn’t contain any other data (e.g. description of local variables orfunction parameters).

-fstandalone-debug

Clang supports a number of optimizations to reduce the size of debuginformation in the binary. They work based on the assumption thatthe debug type information can be spread out over multiplecompilation units. Specifically, the optimizations are:

  • will not emit type definitions for types that are not needed by amodule and could be replaced with a forward declaration.

  • will only emit type info for a dynamic C++ class in the module thatcontains the vtable for the class.

  • will only emit type info for a C++ class (non-trivial, non-aggregate)in the modules that contain a definition for one of its constructors.

  • will only emit type definitions for types that are the subject of explicittemplate instantiation declarations in the presence of an explicitinstantiation definition for the type.

The-fstandalone-debug option turns off these optimizations.This is useful when working with 3rd-party libraries that don’t comewith debug information. Note that Clang will never emit typeinformation for types that are not referenced at all by the program.

-fno-standalone-debug

On Darwin-fstandalone-debug is enabled by default. The-fno-standalone-debug option can be used to get to turn on thevtable-based optimization described above.

-g

Generate complete debug info.

-feliminate-unused-debug-types

By default, Clang does not emit type information for types that are definedbut not used in a program. To retain the debug info for these unused types,the negation-fno-eliminate-unused-debug-types can be used.This can be particularly useful on Windows, when using NATVIS files thatcan reference const symbols that would otherwise be stripped, even in fulldebug or standalone debug modes.

Controlling Macro Debug Info Generation

Debug info for C preprocessor macros increases the size of debug information inthe binary. Macro debug info generated by Clang can be controlled by the flagslisted below.

-fdebug-macro

Generate debug info for preprocessor macros. This flag is discarded when-g0 is enabled.

-fno-debug-macro

Do not generate debug info for preprocessor macros (default).

Controlling Debugger “Tuning”

While Clang generally emits standard DWARF debug info (http://dwarfstd.org),different debuggers may know how to take advantage of different specific DWARFfeatures. You can “tune” the debug info for one of several different debuggers.

-ggdb,-glldb,-gsce,-gdbx

Tune the debug info for thegdb,lldb, Sony PlayStation®debugger, ordbx, respectively. Each of these options implies-g.(Therefore, if you want both-gline-tables-only and debugger tuning, thetuning option must come first.)

Controlling LLVM IR Output

Controlling Value Names in LLVM IR

Emitting value names in LLVM IR increases the size and verbosity of the IR.By default, value names are only emitted in assertion-enabled builds of Clang.However, when reading IR it can be useful to re-enable the emission of valuenames to improve readability.

-fdiscard-value-names

Discard value names when generating LLVM IR.

-fno-discard-value-names

Do not discard value names when generating LLVM IR. This option can be usedto re-enable names for release builds of Clang.

Comment Parsing Options

Clang parses Doxygen and non-Doxygen style documentation comments and attachesthem to the appropriate declaration nodes. By default, it only parsesDoxygen-style comments and ignores ordinary comments starting with// and/*.

-Wdocumentation

Emit warnings about use of documentation comments. This warning group is offby default.

This includes checking that\param commands name parameters that actuallypresent in the function signature, checking that\returns is used only onfunctions that actually return a value etc.

-Wno-documentation-unknown-command

Don’t warn when encountering an unknown Doxygen command.

-fparse-all-comments

Parse all comments as documentation comments (including ordinary commentsstarting with// and/*).

-fcomment-block-commands=[commands]

Define custom documentation commands as block commands. This allows Clang toconstruct the correct AST for these custom commands, and silences warningsabout unknown commands. Several commands must be separated by a commawithout trailing space; e.g.-fcomment-block-commands=foo,bar definescustom commands\foo and\bar.

It is also possible to use-fcomment-block-commands several times; e.g.-fcomment-block-commands=foo-fcomment-block-commands=bar does the sameas above.

CCC_OVERRIDE_OPTIONS

The environment variableCCC_OVERRIDE_OPTIONS can be used to edit clang’scommand line arguments. The value of this variable is a space-separated list ofedits to perform. The edits are applied in the order in which they appear inCCC_OVERRIDE_OPTIONS. Each edit should be one of the following forms:

  • #: Silence information about the changes to the command line arguments.

  • ^FOO: AddFOO as a new argument at the beginning of the command lineright after the name of the compiler executable.

  • +FOO: AddFOO as a new argument at the end of the command line.

  • s/XXX/YYY/: Substitute the regular expressionXXX withYYY in thecommand line.

  • xOPTION: Removes all instances of the literal argumentOPTION.

  • XOPTION: Removes all instances of the literal argumentOPTION, and thefollowing argument.

  • Ox: Removes all flags matchingO orO[sz0-9] and addsOx atthe end of the command line.

This environment variable does not affect the options added by the config files.

C Language Features

The support for standard C in clang is feature-complete except for theC99 floating-point pragmas.

Extensions supported by clang

SeeClang Language Extensions.

Differences between various standard modes

clang supports the -std option, which changes what language mode clang uses.The supported modes for C are c89, gnu89, c94, c99, gnu99, c11, gnu11, c17,gnu17, c23, gnu23, c2y, gnu2y, and various aliases for those modes. If no -stdoption is specified, clang defaults to gnu17 mode. Many C99 and C11 featuresare supported in earlier modes as a conforming extension, with a warning. Use-pedantic-errors to request an error if a feature from a later standardrevision is used in an earlier mode.

Differences between allc* andgnu* modes:

  • c* modes define “__STRICT_ANSI__”.

  • Target-specific defines not prefixed by underscores, likelinux,are defined ingnu* modes.

  • Trigraphs default to being off ingnu* modes; they can be enabledby the-trigraphs option.

  • The parser recognizesasm andtypeof as keywords ingnu* modes;the variants__asm__ and__typeof__ are recognized in all modes.

  • The parser recognizesinline as a keyword ingnu* mode, inaddition to recognizing it in the*99 and later modes for which it ispart of the ISO C standard. The variant__inline__ is recognized in allmodes.

  • The Apple “blocks” extension is recognized by default ingnu* modeson some platforms; it can be enabled in any mode with the-fblocksoption.

Differences between*89 and*94 modes:

  • Digraphs are not recognized in c89 mode.

Differences between*94 and*99 modes:

  • The*99 modes default to implementinginline /__inline__as specified in C99, while the*89 modes implement the GNU version.This can be overridden for individual functions with the__gnu_inline__attribute.

  • The scope of names defined inside afor,if,switch,while,ordo statement is different. (example:if((structx{intx;}*)0){}.)

  • __STDC_VERSION__ is not defined in*89 modes.

  • inline is not recognized as a keyword inc89 mode.

  • restrict is not recognized as a keyword in*89 modes.

  • Commas are allowed in integer constant expressions in*99 modes.

  • Arrays which are not lvalues are not implicitly promoted to pointersin*89 modes.

  • Some warnings are different.

Differences between*99 and*11 modes:

  • Warnings for use of C11 features are disabled.

  • __STDC_VERSION__ is defined to201112L rather than199901L.

Differences between*11 and*17 modes:

  • __STDC_VERSION__ is defined to201710L rather than201112L.

Differences between*17 and*23 modes:

  • __STDC_VERSION__ is defined to202311L rather than201710L.

  • nullptr andnullptr_t are supported, only in*23 mode.

  • ATOMIC_VAR_INIT is removed from*23 mode.

  • bool,true,false,alignas,alignof,static_assert,andthread_local are now first-class keywords, only in*23 mode.

  • typeof andtypeof_unqual are supported, only*23 mode.

  • Bit-precise integers (_BitInt(N)) are supported by default in*23mode, and as an extension in*17 and earlier modes.

  • [[]] attributes are supported by default in*23 mode, and as anextension in*17 and earlier modes.

Differences between*23 and*2y modes:

  • __STDC_VERSION__ is defined to202400L rather than202311L.

GCC extensions not implemented yet

clang tries to be compatible with gcc as much as possible, but some gccextensions are not implemented yet:

  • clang does not support decimal floating point types (_Decimal32 andfriends) yet.

  • clang does not support nested functions; this is a complex featurewhich is infrequently used, so it is unlikely to be implementedanytime soon. In C++11 it can be emulated by assigning lambdafunctions to local variables, e.g:

    autoconstlocal_function=[&](intparameter){// Do something};...local_function(1);
  • clang only supports global register variables when the register specifiedis non-allocatable (e.g. the stack pointer). Support for general globalregister variables is unlikely to be implemented soon because it requiresadditional LLVM backend support.

  • clang does not support static initialization of flexible arraymembers. This appears to be a rarely used extension, but could beimplemented pending user demand.

  • clang does not support__builtin_va_arg_pack/__builtin_va_arg_pack_len. This isused rarely, but in some potentially interesting places, like theglibc headers, so it may be implemented pending user demand. Notethat because clang pretends to be like GCC 4.2, and this extensionwas introduced in 4.3, the glibc headers will not try to use thisextension with clang at the moment.

  • clang does not support the gcc extension for forward-declaringfunction parameters; this has not shown up in any real-world codeyet, though, so it might never be implemented.

This is not a complete list; if you find an unsupported extensionmissing from this list, please send an e-mail to cfe-dev. This listcurrently excludes C++; seeC++ Language Features. Also, thislist does not include bugs in mostly-implemented features; please seethebugtrackerfor known existing bugs (FIXME: Is there a section for bug-reportingguidelines somewhere?).

Intentionally unsupported GCC extensions

  • clang does not support the gcc extension that allows variable-lengtharrays in structures. This is for a few reasons: one, it is tricky toimplement, two, the extension is completely undocumented, and three,the extension appears to be rarely used. Note that clangdoessupport flexible array members (arrays with a zero or unspecifiedsize at the end of a structure).

  • GCC accepts many expression forms that are not valid integer constantexpressions in bit-field widths, enumerator constants, case labels,and in array bounds at global scope. Clang also accepts additionalexpression forms in these contexts, but constructs that GCC accepts due tosimplifications GCC performs while parsing, such asx-x (wherex is avariable) will likely never be accepted by Clang.

  • clang does not support__builtin_apply and friends; this extensionis extremely obscure and difficult to implement reliably.

Microsoft extensions

clang has support for many extensions from Microsoft Visual C++. To enable theseextensions, use the-fms-extensions command-line option. This is the defaultfor Windows targets. Clang does not implement every pragma or declspec providedby MSVC, but the popular ones, such as__declspec(dllexport) and#pragmacomment(lib) are well supported.

clang has a-fms-compatibility flag that makes clang accept enoughinvalid C++ to be able to parse most Microsoft headers. For example, itallowsunqualified lookup of dependent base class members, which isa common compatibility issue with clang. This flag is enabled by defaultfor Windows targets.

-fdelayed-template-parsing lets clang delay parsing of function templatedefinitions until the end of a translation unit. This flag is enabled bydefault for Windows targets.

For compatibility with existing code that compiles with MSVC, clang defines the_MSC_VER and_MSC_FULL_VER macros. When on Windows, these default toeither the same value as the currently installed version of cl.exe, or1933and193300000 (respectively). The-fms-compatibility-version= flagoverrides these values. It accepts a dotted version tuple, such as 19.00.23506.Changing the MSVC compatibility version makes clang behave more like thatversion of MSVC. For example,-fms-compatibility-version=19 will enableC++14 features and definechar16_t andchar32_t as builtin types.

C++ Language Features

clang fully implements all of standard C++98 except for exportedtemplates (which were removed in C++11), all of standard C++11,C++14, and C++17, and most of C++20.

See theC++ support in Clang pagefor detailed information on C++ feature support across Clang versions.

Controlling implementation limits

-fbracket-depth=N

Sets the limit for nested parentheses, brackets, and braces to N. Thedefault is 256.

-fconstexpr-depth=N

Sets the limit for constexpr function invocations to N. The default is 512.

-fconstexpr-steps=N

Sets the limit for the number of full-expressions evaluated in a singleconstant expression evaluation. This also controls the maximum sizeof array and dynamic array allocation that can be constant evaluated.The default is 1048576.

-ftemplate-depth=N

Sets the limit for recursively nested template instantiations to N. Thedefault is 1024.

-foperator-arrow-depth=N

Sets the limit for iterative calls to ‘operator->’ functions to N. Thedefault is 256.

Objective-C Language Features

Objective-C++ Language Features

OpenMP Features

Clang supports all OpenMP 4.5 directives and clauses. SeeOpenMP Supportfor additional details.

Use-fopenmp to enable OpenMP. Support for OpenMP can be disabled with-fno-openmp.

Use-fopenmp-simd to enable OpenMP simd features only, without linkingthe runtime library; for combined constructs(e.g.#pragmaompparallelforsimd) the non-simd directives and clauseswill be ignored. This can be disabled with-fno-openmp-simd.

Controlling implementation limits

-fopenmp-use-tls

Controls code generation for OpenMP threadprivate variables. In presence ofthis option all threadprivate variables are generated the same way as threadlocal variables, using TLS support. If-fno-openmp-use-tlsis provided or target does not support TLS, code generation for threadprivatevariables relies on OpenMP runtime library.

OpenCL Features

Clang can be used to compile OpenCL kernels for execution on a device(e.g. GPU). It is possible to compile the kernel into a binary (e.g. for AMDGPU)that can be uploaded to run directly on a device (e.g. usingclCreateProgramWithBinary) orinto generic bitcode files loadable into other toolchains.

Compiling to a binary using the default target from the installation can be doneas follows:

$echo"kernel void k(){}">test.cl$clangtest.cl

Compiling for a specific target can be done by specifying the triple correspondingto the target, for example:

$clang--target=nvptx64-unknown-unknowntest.cl$clang--target=amdgcn-amd-amdhsa-mcpu=gfx900test.cl

Compiling to bitcode can be done as follows:

$clang-c-emit-llvmtest.cl

This will produce a filetest.bc that can be used in vendor toolchainsto perform machine code generation.

Note that if compiled to bitcode for generic targets such as SPIR/SPIR-V,portable IR is produced that can be used with various vendortools as well as open source tools such asSPIRV-LLVM Translatorto produce SPIR-V binary. More details are provided inthe offlinecompilation from OpenCL kernel sources into SPIR-V using open sourcetools.From clang 14 onwards SPIR-V can be generated directly as detailed inthe SPIR-V support section.

Clang currently supports OpenCL C language standards up to v2.0. Clang mainlysupports full profile. There is only very limited support of the embeddedprofile.From clang 9 a C++ mode is available for OpenCL (seeC++ for OpenCL).

OpenCL v3.0 support is complete but it remains in experimental state, see moredetails about the experimental features and limitations inOpenCL Supportpage.

OpenCL Specific Options

Most of the OpenCL build options fromthe specification v2.0 section 5.8.4 are available.

Examples:

$clang-cl-std=CL2.0-cl-single-precision-constanttest.cl

Many flags used for the compilation for C sources can also be passed whilecompiling for OpenCL, examples:-c,-O<1-4|s>,-o,-emit-llvm, etc.

Some extra options are available to support special OpenCL features.

-cl-no-stdinc

Allows to disable all extra types and functions that are not native to the compiler.This might reduce the compilation speed marginally but many declarations from theOpenCL standard will not be accessible. For example, the following will fail tocompile.

$echo"bool is_wg_uniform(int i){return get_enqueued_local_size(i)==get_local_size(i);}">test.cl$clang-cl-std=CL2.0-cl-no-stdinctest.clerror: use of undeclared identifier 'get_enqueued_local_size'error: use of undeclared identifier 'get_local_size'

More information about the standard types and functions is provided inthesection on the OpenCL Header.

-cl-ext

Enables/Disables support of OpenCL extensions and optional features. All OpenCLtargets set a list of extensions that they support. Clang allows to amend this usingthe-cl-ext flag with a comma-separated list of extensions prefixed with'+' or'-'. The syntax:-cl-ext=<(['-'|'+']<extension>[,])+>, whereextensions can be either one ofthe OpenCL published extensionsor any vendor extension. Alternatively,'all' can be used to enableor disable all known extensions.

Example disabling double support for the 64-bit SPIR-V target:

$clang-c--target=spirv64-cl-ext=-cl_khr_fp64test.cl

Enabling all extensions except double support in R600 AMD GPU can be done using:

$clang--target=r600-cl-ext=-all,+cl_khr_fp16test.cl

Note that some generic targets e.g. SPIR/SPIR-V enable all extensions/features inclang by default.

OpenCL Targets

OpenCL targets are derived from the regular Clang target classes. The OpenCLspecific parts of the target representation provide address space mapping aswell as a set of supported extensions.

Specific Targets

There is a set of concrete HW architectures that OpenCL can be compiled for.

  • For AMD target:

    $clang--target=amdgcn-amd-amdhsa-mcpu=gfx900test.cl
  • For Nvidia architectures:

    $clang--target=nvptx64-unknown-unknowntest.cl

Generic Targets

  • A SPIR-V binary can be produced for 32- or 64-bit targets.

    $clang--target=spirv32-ctest.cl$clang--target=spirv64-ctest.cl

    More details can be found inthe SPIR-V support section.

  • SPIR is available as a generic target to allow portable bitcode to be producedthat can be used across GPU toolchains. The implementation followsthe SPIRspecification. There are two flavorsavailable for 32 and 64 bits.

    $clang--target=spirtest.cl-emit-llvm-c$clang--target=spir64test.cl-emit-llvm-c

    Clang will generate SPIR v1.2 compatible IR for OpenCL versions up to 2.0 andSPIR v2.0 for OpenCL v2.0 or C++ for OpenCL.

  • x86 is used by some implementations that are x86 compatible and currentlyremains for backwards compatibility (with older implementations prior toSPIR target support). For “non-SPMD” targets which cannot spawn multiplework-items on the fly using hardware, which covers practically all non-GPUdevices such as CPUs and DSPs, additional processing is needed for the kernelsto support multiple work-item execution. For this, a 3rd party toolchain,such as for examplePOCL, can be used.

    This target does not support multiple memory segments and, therefore, the fakeaddress space map can be added using the-ffake-address-space-map flag.

    All known OpenCL extensions and features are set to supported in the generic targets,however-cl-ext flag can be used to toggle individual extensions andfeatures.

OpenCL Header

By default Clang will include standard headers and therefore most of OpenCLbuiltin functions and types are available during compilation. Thedefault declarations of non-native compiler types and functions can be disabledby using flag-cl-no-stdinc.

The following example demonstrates that OpenCL kernel sources with variousstandard builtin functions can be compiled without the need for an explicitincludes or compiler flags.

$echo"bool is_wg_uniform(int i){return get_enqueued_local_size(i)==get_local_size(i);}">test.cl$clang-cl-std=CL2.0test.cl

More information about the default headers is provided inOpenCL Support.

OpenCL Extensions

Most of thecl_khr_* extensions to OpenCL C fromthe official OpenCLregistry are available andconfigured per target depending on the support available in the specificarchitecture.

It is possible to alter the default extensions setting per target using-cl-ext flag. (Seeflags description for more details).

Vendor extensions can be added flexibly by declaring the list of types andfunctions associated with each extensions enclosed within the followingcompiler pragma directives:

#pragma OPENCL EXTENSION the_new_extension_name : begin// declare types and functions associated with the extension here#pragma OPENCL EXTENSION the_new_extension_name : end

For example, parsing the following code addsmy_t type andmy_funcfunction to the custommy_ext extension.

#pragma OPENCL EXTENSION my_ext : begintypedefstruct{inta;}my_t;voidmy_func(my_t);#pragma OPENCL EXTENSION my_ext : end

There is no conflict resolution for identifier clashes among extensions.It is therefore recommended that the identifiers are prefixed with adouble underscore to avoid clashing with user space identifiers. Vendorextension should use reserved identifier prefix e.g. amd, arm, intel.

Clang also supports language extensions documented inThe OpenCL C LanguageExtensions Documentation.

OpenCL-Specific Attributes

OpenCL support in Clang contains a set of attribute taken directly from thespecification as well as additional attributes.

See alsoAttributes in Clang.

nosvm

Clang supports this attribute to comply to OpenCL v2.0 conformance, but itdoes not have any effect on the IR. For more details refer to the specificationsection 6.7.2

opencl_unroll_hint

The implementation of this feature mirrors the unroll hint for C.More details on the syntax can be found in the specificationsection 6.11.5

convergent

To make sure no invalid optimizations occur for single program multiple data(SPMD) / single instruction multiple thread (SIMT) Clang provides attributes thatcan be used for special functions that have cross work item semantics.An example is the subgroup operations such asintel_sub_group_shuffle

// Define custom my_sub_group_shuffle(data, c)// that makes use of intel_sub_group_shuffler1=...if(r0)r1=computeA();// Shuffle data from r1 into r3// of threads id r2.r3=my_sub_group_shuffle(r1,r2);if(r0)r3=computeB();

with non-SPMD semantics this is optimized to the following equivalent code:

r1=...if(!r0)// Incorrect functionality! The data in r1// have not been computed by all threads yet.r3=my_sub_group_shuffle(r1,r2);else{r1=computeA();r3=my_sub_group_shuffle(r1,r2);r3=computeB();}

Declaring the functionmy_sub_group_shuffle with the convergent attributewould prevent this:

my_sub_group_shuffle()__attribute__((convergent));

Usingconvergent guarantees correct execution by keeping CFG equivalencewrt operations marked asconvergent. CFG is equivalent toG wrtnodeNi :iffNj(i≠j) domination and post-domination relations withrespect toNi remain the same in bothG and.

noduplicate

noduplicate is more restrictive with respect to optimizations thanconvergent because a convergent function only preserves CFG equivalence.This allows some optimizations to happen as long as the control flow remainsunmodified.

for(inti=0;i<4;i++)my_sub_group_shuffle()

can be modified to:

my_sub_group_shuffle();my_sub_group_shuffle();my_sub_group_shuffle();my_sub_group_shuffle();

while usingnoduplicate would disallow this. Alsonoduplicate doesn’thave the same safe semantics of CFG asconvergent and can cause changes inCFG that modify semantics of the original program.

noduplicate is kept for backwards compatibility only and it considered to bedeprecated for future uses.

C++ for OpenCL

Starting from clang 9 kernel code can contain C++17 features: classes, templates,function overloading, type deduction, etc. Please note that this is not animplementation ofOpenCL C++ andthere is no plan to support it in clang in any new releases in the near future.

Clang currently supports C++ for OpenCL 1.0 and 2021.For detailed information about this language refer to the C++ for OpenCLProgramming Language Documentation availableinthe latest buildor inthe official release.

To enable the C++ for OpenCL mode, pass one of following command line options whencompiling.clcpp file:

  • C++ for OpenCL 1.0:-cl-std=clc++,-cl-std=CLC++,-cl-std=clc++1.0,-cl-std=CLC++1.0,-std=clc++,-std=CLC++,-std=clc++1.0 or-std=CLC++1.0.

  • C++ for OpenCL 2021:-cl-std=clc++2021,-cl-std=CLC++2021,-std=clc++2021,-std=CLC++2021.

Example of use:
template<classT>Tadd(Tx,Ty){returnx+y;}__kernelvoidtest(__globalfloat*a,__globalfloat*b){autoindex=get_global_id(0);a[index]=add(b[index],b[index+1]);}
clang -cl-std=clc++1.0 test.clcppclang -cl-std=clc++ -c --target=spirv64 test.cl

By default, files with.clcpp extension are compiled with the C++ forOpenCL 1.0 mode.

clang test.clcpp

For backward compatibility files with.cl extensions can also be compiledin C++ for OpenCL mode but the desirable language mode must be activated witha flag.

clang -cl-std=clc++ test.cl

Support of C++ for OpenCL 2021 is currently in experimental phase, refer toOpenCL Support for more details.

C++ for OpenCL kernel sources can also be compiled online in drivers supportingcl_ext_cxx_for_openclextension.

Constructing and destroying global objects

Global objects with non-trivial constructors require the constructors to be runbefore the first kernel using the global objects is executed. Similarly globalobjects with non-trivial destructors require destructor invocation just afterthe last kernel using the program objects is executed.In OpenCL versions earlier than v2.2 there is no support for invoking globalconstructors. However, an easy workaround is to manually enqueue theconstructor initialization kernel that has the following name scheme_GLOBAL__sub_I_<compiledfilename>.This kernel is only present if there are global objects with non-trivialconstructors present in the compiled binary. One way to check this is bypassingCL_PROGRAM_KERNEL_NAMES toclGetProgramInfo (OpenCL v2.0s5.8.7) and then checking whether any kernel name matches the naming scheme ofglobal constructor initialization kernel above.

Note that if multiple files are compiled and linked into libraries, multiplekernels that initialize global objects for multiple modules would have to beinvoked.

Applications are currently required to run initialization of global objectsmanually before running any kernels in which the objects are used.

clang -cl-std=clc++ test.cl

If there are any global objects to be initialized, the final binary willcontain the_GLOBAL__sub_I_test.cl kernel to be enqueued.

Note that the manual workaround only applies to objects declared at theprogram scope. There is no manual workaround for the construction of staticobjects with non-trivial constructors inside functions.

Global destructors can not be invoked manually in the OpenCL v2.0 drivers.However, all memory used for program scope objects should be released onclReleaseProgram.

Libraries

Limited experimental support of C++ standard libraries for OpenCL isdescribed inOpenCL Support page.

Target-Specific Features and Limitations

CPU Architectures Features and Limitations

X86

The support for X86 (both 32-bit and 64-bit) is considered stable onDarwin (macOS), Linux, FreeBSD, and Dragonfly BSD: it has been testedto correctly compile many large C, C++, Objective-C, and Objective-C++codebases.

Onx86_64-mingw32, passing i128(by value) is incompatible with theMicrosoft x64 calling convention. You might need to tweakWinX86_64ABIInfo::classify() in lib/CodeGen/Targets/X86.cpp.

For the X86 target, clang supports the-m16 command lineargument which enables 16-bit code output. This is broadly similar tousingasm(".code16gcc") with the GNU toolchain. The generated codeand the ABI remains 32-bit but the assembler emits instructionsappropriate for a CPU running in 16-bit mode, with address-size andoperand-size prefixes to enable 32-bit addressing and operations.

Several micro-architecture levels as specified by the x86-64 psABI are defined.They are cumulative in the sense that features from previous levels areimplicitly included in later levels.

  • -march=x86-64: CMOV, CMPXCHG8B, FPU, FXSR, MMX, FXSR, SCE, SSE, SSE2

  • -march=x86-64-v2: (close to Nehalem) CMPXCHG16B, LAHF-SAHF, POPCNT, SSE3, SSE4.1, SSE4.2, SSSE3

  • -march=x86-64-v3: (close to Haswell) AVX, AVX2, BMI1, BMI2, F16C, FMA, LZCNT, MOVBE, XSAVE

  • -march=x86-64-v4: AVX512F, AVX512BW, AVX512CD, AVX512DQ, AVX512VL

Intel AVX10 ISA isa major new vector ISA incorporating the modern vectorization aspects ofIntel AVX-512. This ISA will be supported on all future Intel processors.Users are supposed to use the new options-mavx10.N and-mavx10.N-512on these processors and should not use traditional AVX512 options anymore.

TheN in-mavx10.N represents a continuous integer number startingfrom1.-mavx10.N is an alias of-mavx10.N-256, which means toenable all instructions within AVX10 version N at a maximum vector length of256 bits.-mavx10.N-512 enables all instructions at a maximum vectorlength of 512 bits, which is a superset of instructions-mavx10.N enabled.

Current binaries built with AVX512 features can run on Intel AVX10/512 capableprocessors without re-compile, but cannot run on AVX10/256 capable processors.Users need to re-compile their code with-mavx10.N, and maybe update somecode that calling to 512-bit X86 specific intrinsics and passing or returning512-bit vector types in function call, if they want to run on AVX10/256 capableprocessors. Binaries built with-mavx10.N can run on both AVX10/256 andAVX10/512 capable processors.

Users can add a-mno-evex512 in the command line with AVX512 options ifthey want to run the binary on both legacy AVX512 and new AVX10/256 capableprocessors. The option has the same constraints as-mavx10.N, i.e.,cannot call to 512-bit X86 specific intrinsics and pass or return 512-bit vectortypes in function call.

Users should avoid using AVX512 features in function target attributes whendeveloping code for AVX10. If they have to do so, they need to add an explicitevex512 orno-evex512 together with AVX512 features for 512-bit ornon-512-bit functions respectively to avoid unexpected code generation. Bothcommand line option and target attribute of EVEX512 feature can only be usedwith AVX512. They don’t affect vector size of AVX10.

User should not mix the use AVX10 and AVX512 options together at any time,because the option combinations are conflicting sometimes. For example, acombination of-mavx512f-mavx10.1-256 doesn’t show a clear intention tocompiler, since instructions in AVX512F and AVX10.1/256 intersect but do notoverlap. In this case, compiler will emit warning for it, but the behavioris determined. It will generate the same code as option-mavx10.1-512.A similar case is-mavx512f-mavx10.2-256, which equals to-mavx10.1-512-mavx10.2-256, becauseavx10.2-256 impliesavx10.1-256and-mavx512f-mavx10.1-256 equals to-mavx10.1-512.

There are some new macros introduced with AVX10 support.-mavx10.1-256 willenable__AVX10_1__ and__EVEX256__, while-mavx10.1-512 enables__AVX10_1__,__EVEX256__,__EVEX512__ and__AVX10_1_512__.Besides, both-mavx10.1-256 and-mavx10.1-512 will enable all AVX512feature specific macros. A AVX512 feature will enable both__EVEX256__,__EVEX512__ and its own macro. So__EVEX512__ can be used to guard codethat can run on both legacy AVX512 and AVX10/512 capable processors but cannotrun on AVX10/256, while a AVX512 macro like__AVX512F__ cannot tell thedifference among the three options. Users need to check additional macros__AVX10_1__ and__EVEX512__ if they want to make distinction.

ARM

The support for ARM (specifically ARMv6 and ARMv7) is considered stableon Darwin (iOS): it has been tested to correctly compile many large C,C++, Objective-C, and Objective-C++ codebases. Clang only supports alimited number of ARM architectures. It does not yet fully supportARMv5, for example.

PowerPC

The support for PowerPC (especially PowerPC64) is considered stableon Linux and FreeBSD: it has been tested to correctly compile manylarge C and C++ codebases. PowerPC (32bit) is still missing certainfeatures (e.g. PIC code on ELF platforms).

Other platforms

clang currently contains some support for other architectures (e.g. Sparc);however, significant pieces of code generation are still missing, and theyhaven’t undergone significant testing.

clang contains limited support for the MSP430 embedded processor, butboth the clang support and the LLVM backend support are highlyexperimental.

Other platforms are completely unsupported at the moment. Adding theminimal support needed for parsing and semantic analysis on a newplatform is quite easy; seelib/Basic/Targets.cpp in the clang sourcetree. This level of support is also sufficient for conversion to LLVM IRfor simple programs. Proper support for conversion to LLVM IR requiresadding code tolib/CodeGen/CGCall.cpp at the moment; this is likely tochange soon, though. Generating assembly requires a suitable LLVMbackend.

Operating System Features and Limitations

Windows

Clang has experimental support for targeting “Cygming” (Cygwin / MinGW)platforms.

See alsoMicrosoft Extensions.

Cygwin

Clang works on Cygwin-1.7.

MinGW32

Clang works on some mingw32 distributions. Clang assumes directories asbelow;

  • C:/mingw/include

  • C:/mingw/lib

  • C:/mingw/lib/gcc/mingw32/4.[3-5].0/include/c++

On MSYS, a few tests might fail.

MinGW-w64

For 32-bit (i686-w64-mingw32), and 64-bit (x86_64-w64-mingw32), Clangassumes as below;

  • GCCversions4.5.0to4.5.3,4.6.0to4.6.2,or4.7.0(fortheC++headersearchpath)

  • some_directory/bin/gcc.exe

  • some_directory/bin/clang.exe

  • some_directory/bin/clang++.exe

  • some_directory/bin/../include/c++/GCC_version

  • some_directory/bin/../include/c++/GCC_version/x86_64-w64-mingw32

  • some_directory/bin/../include/c++/GCC_version/i686-w64-mingw32

  • some_directory/bin/../include/c++/GCC_version/backward

  • some_directory/bin/../x86_64-w64-mingw32/include

  • some_directory/bin/../i686-w64-mingw32/include

  • some_directory/bin/../include

This directory layout is standard for any toolchain you will find on theofficialMinGW-w64 website.

Clang expects the GCC executable “gcc.exe” compiled fori686-w64-mingw32 (orx86_64-w64-mingw32) to be present on PATH.

Some tests might fail onx86_64-w64-mingw32.

AIX

TOC Data Transformation

TOC data transformation is off by default (-mno-tocdata).When-mtocdata is specified, the TOC data transformation will be applied toall suitable variables with static storage duration, including static datamembers of classes and block-scope static variables (if not marked as exceptions,see further below).

Suitable variables must:

  • have complete types

  • be independently generated (i.e., not placed in a pool)

  • be at most as large as a pointer

  • not be aligned more strictly than a pointer

  • not be structs containing flexible array members

  • not have internal linkage

  • not have aliases

  • not have section attributes

  • not be thread local storage

The TOC data transformation results in the variable, not its address,being placed in the TOC. This eliminates the need to load the address of thevariable from the TOC.

Note:If the TOC data transformation is applied to a variable whose definitionis imported, the linker will generate fixup code for reading or writing to thevariable.

When multiple toc-data options are used, the last option used has the affect.For example: -mno-tocdata=g5,g1 -mtocdata=g1,g2 -mno-tocdata=g2 -mtocdata=g3,g4results in -mtocdata=g1,g3,g4

Names of variables not having external linkage will be ignored.

Options:

-mno-tocdata

This is the default behaviour. Only variables explicitly specified with-mtocdata= will have the TOC data transformation applied.

-mtocdata

Apply the TOC data transformation to all suitable variables with staticstorage duration (including static data members of classes and block-scopestatic variables) that are not explicitly specified with-mno-tocdata=.

-mno-tocdata=

Can be used in conjunction with-mtocdata to mark the comma-separatedlist of external linkage variables, specified using their mangled names, asexceptions to-mtocdata.

-mtocdata=

Apply the TOC data transformation to the comma-separated list of externallinkage variables, specified using their mangled names, if they are suitable.Emit diagnostics for all unsuitable variables specified.

Default Visibility Export Mapping

The-mdefault-visibility-export-mapping= option can be used to controlmapping of default visibility to an explicit shared object export(i.e. XCOFF exported visibility). Three values are provided for the option:

  • -mdefault-visibility-export-mapping=none: no additional exportinformation is created for entities with default visibility.

  • -mdefault-visibility-export-mapping=explicit: mark entities for exportif they have explicit (e.g. via an attribute) default visibility from thesource, including RTTI.

  • -mdefault-visibility-export-mapping=all: set XCOFF exported visibilityfor all entities with default visibility from any source. This gives aexport behavior similar to ELF platforms where all entities with defaultvisibility are exported.

SPIR-V support

Clang supports generation of SPIR-V conformant tothe OpenCL EnvironmentSpecification.

To generate SPIR-V binaries, Clang uses the in-tree LLVM SPIR-V backend.

Example usage for OpenCL kernel compilation:

$clang--target=spirv32-ctest.cl$clang--target=spirv64-ctest.cl

Both invocations of Clang will result in the generation of a SPIR-V binary filetest.o for 32 bit and 64 bit respectively. This file can be importedby an OpenCL driver that support SPIR-V consumption or it can be compiledfurther by offline SPIR-V consumer tools.

Converting to SPIR-V produced with the optimization levels other than-O0 iscurrently available as an experimental feature and it is not guaranteed to workin all cases.

Linking is done usingspirv-link fromthe SPIRV-Tools project. Similar to other externallinkers, Clang will expectspirv-link to be installed separately and to bepresent in thePATH environment variable. Please refer tothe build andinstallation instructions.

$clang--target=spirv64test1.cltest2.cl

More information about the SPIR-V target settings and supported versions of SPIR-Vformat can be found inthe SPIR-V target guide.

clang-cl

clang-cl is an alternative command-line interface to Clang, designed forcompatibility with the Visual C++ compiler, cl.exe.

To enable clang-cl to find system headers, libraries, and the linker when runfrom the command-line, it should be executed inside a Visual Studio Native ToolsCommand Prompt or a regular Command Prompt where the environment has been setup using e.g.vcvarsall.bat.

clang-cl can also be used from inside Visual Studio by selecting the LLVMPlatform Toolset. The toolset is not part of the installer, but may be installedseparately from theVisual Studio Marketplace.To use the toolset, select a project in Solution Explorer, open its PropertyPage (Alt+F7), and in the “General” section of “Configuration Properties”change “Platform Toolset” to LLVM. Doing so enables an additional PropertyPage for selecting the clang-cl executable to use for builds.

To use the toolset with MSBuild directly, invoke it with e.g./p:PlatformToolset=LLVM. This allows trying out the clang-cl toolchainwithout modifying your project files.

It’s also possible to point MSBuild at clang-cl without changing toolset bypassing/p:CLToolPath=c:\llvm\bin/p:CLToolExe=clang-cl.exe.

When using CMake and the Visual Studio generators, the toolset can be set with the-T flag:

cmake-G"Visual Studio 16 2019"-TLLVM..

When using CMake with the Ninja generator, set theCMAKE_C_COMPILER andCMAKE_CXX_COMPILER variables to clang-cl:

cmake-GNinja-DCMAKE_C_COMPILER="c:/Program Files (x86)/LLVM/bin/clang-cl.exe"-DCMAKE_CXX_COMPILER="c:/Program Files (x86)/LLVM/bin/clang-cl.exe"..

Command-Line Options

To be compatible with cl.exe, clang-cl supports most of the same command-lineoptions. Those options can start with either/ or-. It also supportssome of Clang’s core options, such as the-W options.

Options that are known to clang-cl, but not currently supported, are ignoredwith a warning. For example:

clang-cl.exe:warning:argumentunusedduringcompilation:'/AI'

To suppress warnings about unused arguments, use the-Qunused-arguments option.

Options that are not known to clang-cl will be ignored by default. Use the-Werror=unknown-argument option in order to treat them as errors. If theseoptions are spelled with a leading/, they will be mistaken for a filename:

clang-cl.exe:error:nosuchfileordirectory:'/foobar'

Pleasefile a bugfor any valid cl.exe flags that clang-cl does not understand.

Executeclang-cl/? to see a list of supported options:

CL.EXE COMPATIBILITY OPTIONS:  /?                      Display available options  /arch:<value>           Set architecture for code generation  /Brepro-                Emit an object file which cannot be reproduced over time  /Brepro                 Emit an object file which can be reproduced over time  /clang:<arg>            Pass <arg> to the clang driver  /C                      Don't discard comments when preprocessing  /c                      Compile only  /d1PP                   Retain macro definitions in /E mode  /d1reportAllClassLayout Dump record layout information  /diagnostics:caret      Enable caret and column diagnostics (on by default)  /diagnostics:classic    Disable column and caret diagnostics  /diagnostics:column     Disable caret diagnostics but keep column info  /D <macro[=value]>      Define macro  /EH<value>              Exception handling model  /EP                     Disable linemarker output and preprocess to stdout  /execution-charset:<value>                          Runtime encoding, supports only UTF-8  /E                      Preprocess to stdout  /FA                     Output assembly code file during compilation  /Fa<file or directory>  Output assembly code to this file during compilation (with /FA)  /Fe<file or directory>  Set output executable file or directory (ends in / or \)  /FI <value>             Include file before parsing  /Fi<file>               Set preprocess output file name (with /P)  /Fo<file or directory>  Set output object file, or directory (ends in / or \) (with /c)  /fp:except-  /fp:except  /fp:fast  /fp:precise  /fp:strict  /Fp<filename>           Set pch filename (with /Yc and /Yu)  /GA                     Assume thread-local variables are defined in the executable  /Gd                     Set __cdecl as a default calling convention  /GF-                    Disable string pooling  /GF                     Enable string pooling (default)  /GR-                    Disable emission of RTTI data  /Gregcall               Set __regcall as a default calling convention  /GR                     Enable emission of RTTI data  /Gr                     Set __fastcall as a default calling convention  /GS-                    Disable buffer security check  /GS                     Enable buffer security check (default)  /Gs                     Use stack probes (default)  /Gs<value>              Set stack probe size (default 4096)  /guard:<value>          Enable Control Flow Guard with /guard:cf,                          or only the table with /guard:cf,nochecks.                          Enable EH Continuation Guard with /guard:ehcont  /Gv                     Set __vectorcall as a default calling convention  /Gw-                    Don't put each data item in its own section  /Gw                     Put each data item in its own section  /GX-                    Disable exception handling  /GX                     Enable exception handling  /Gy-                    Don't put each function in its own section (default)  /Gy                     Put each function in its own section  /Gz                     Set __stdcall as a default calling convention  /help                   Display available options  /imsvc <dir>            Add directory to system include search path, as if part of %INCLUDE%  /I <dir>                Add directory to include search path  /J                      Make char type unsigned  /LDd                    Create debug DLL  /LD                     Create DLL  /link <options>         Forward options to the linker  /MDd                    Use DLL debug run-time  /MD                     Use DLL run-time  /MTd                    Use static debug run-time  /MT                     Use static run-time  /O0                     Disable optimization  /O1                     Optimize for size  (same as /Og     /Os /Oy /Ob2 /GF /Gy)  /O2                     Optimize for speed (same as /Og /Oi /Ot /Oy /Ob2 /GF /Gy)  /Ob0                    Disable function inlining  /Ob1                    Only inline functions which are (explicitly or implicitly) marked inline  /Ob2                    Inline functions as deemed beneficial by the compiler  /Ob3                    Same as /Ob2  /Od                     Disable optimization  /Og                     No effect  /Oi-                    Disable use of builtin functions  /Oi                     Enable use of builtin functions  /Os                     Optimize for size (like clang -Os)  /Ot                     Optimize for speed (like clang -O3)  /Ox                     Deprecated (same as /Og /Oi /Ot /Oy /Ob2); use /O2 instead  /Oy-                    Disable frame pointer omission (x86 only, default)  /Oy                     Enable frame pointer omission (x86 only)  /O<flags>               Set multiple /O flags at once; e.g. '/O2y-' for '/O2 /Oy-'  /o <file or directory>  Set output file or directory (ends in / or \)  /P                      Preprocess to file  /Qvec-                  Disable the loop vectorization passes  /Qvec                   Enable the loop vectorization passes  /showFilenames-         Don't print the name of each compiled file (default)  /showFilenames          Print the name of each compiled file  /showIncludes           Print info about included files to stderr  /source-charset:<value> Source encoding, supports only UTF-8  /std:<value>            Language standard to compile for  /TC                     Treat all source files as C  /Tc <filename>          Specify a C source file  /TP                     Treat all source files as C++  /Tp <filename>          Specify a C++ source file  /utf-8                  Set source and runtime encoding to UTF-8 (default)  /U <macro>              Undefine macro  /vd<value>              Control vtordisp placement  /vmb                    Use a best-case representation method for member pointers  /vmg                    Use a most-general representation for member pointers  /vmm                    Set the default most-general representation to multiple inheritance  /vms                    Set the default most-general representation to single inheritance  /vmv                    Set the default most-general representation to virtual inheritance  /volatile:iso           Volatile loads and stores have standard semantics  /volatile:ms            Volatile loads and stores have acquire and release semantics  /W0                     Disable all warnings  /W1                     Enable -Wall  /W2                     Enable -Wall  /W3                     Enable -Wall  /W4                     Enable -Wall and -Wextra  /Wall                   Enable -Weverything  /WX-                    Do not treat warnings as errors  /WX                     Treat warnings as errors  /w                      Disable all warnings  /X                      Don't add %INCLUDE% to the include search path  /Y-                     Disable precompiled headers, overrides /Yc and /Yu  /Yc<filename>           Generate a pch file for all code up to and including <filename>  /Yu<filename>           Load a pch file and use it instead of all code up to and including <filename>  /Z7                     Enable CodeView debug information in object files  /Zc:char8_t             Enable C++20 char8_t type  /Zc:char8_t-            Disable C++20 char8_t type  /Zc:dllexportInlines-   Don't dllexport/dllimport inline member functions of dllexport/import classes  /Zc:dllexportInlines    dllexport/dllimport inline member functions of dllexport/import classes (default)  /Zc:sizedDealloc-       Disable C++14 sized global deallocation functions  /Zc:sizedDealloc        Enable C++14 sized global deallocation functions  /Zc:strictStrings       Treat string literals as const  /Zc:threadSafeInit-     Disable thread-safe initialization of static variables  /Zc:threadSafeInit      Enable thread-safe initialization of static variables  /Zc:trigraphs-          Disable trigraphs (default)  /Zc:trigraphs           Enable trigraphs  /Zc:twoPhase-           Disable two-phase name lookup in templates  /Zc:twoPhase            Enable two-phase name lookup in templates  /Zi                     Alias for /Z7. Does not produce PDBs.  /Zl                     Don't mention any default libraries in the object file  /Zp                     Set the default maximum struct packing alignment to 1  /Zp<value>              Specify the default maximum struct packing alignment  /Zs                     Run the preprocessor, parser and semantic analysis stagesOPTIONS:  -###                    Print (but do not run) the commands to run for this compilation  --analyze               Run the static analyzer  -faddrsig               Emit an address-significance table  -fansi-escape-codes     Use ANSI escape codes for diagnostics  -fblocks                Enable the 'blocks' language feature  -fcf-protection=<value> Instrument control-flow architecture protection. Options: return, branch, full, none.  -fcf-protection         Enable cf-protection in 'full' mode  -fcolor-diagnostics     Use colors in diagnostics  -fcomplete-member-pointers                          Require member pointer base types to be complete if they would be significant under the Microsoft ABI  -fcoverage-mapping      Generate coverage mapping to enable code coverage analysis  -fcrash-diagnostics-dir=<dir>                          Put crash-report files in <dir>  -fdebug-macro           Emit macro debug information  -fdelayed-template-parsing                          Parse templated function definitions at the end of the translation unit  -fdiagnostics-absolute-paths                          Print absolute paths in diagnostics  -fdiagnostics-parseable-fixits                          Print fix-its in machine parseable form  -flto=<value>           Set LTO mode to either 'full' or 'thin'  -flto                   Enable LTO in 'full' mode  -fmerge-all-constants   Allow merging of constants  -fmodule-file=<module_name>=<module-file>                          Use the specified module file that provides the module <module_name>  -fmodule-header=<header>                          Build <header> as a C++20 header unit  -fmodule-output=<path>                          Save intermediate module file results when compiling a standard C++ module unit.  -fms-compatibility-version=<value>                          Dot-separated value representing the Microsoft compiler version                          number to report in _MSC_VER (0 = don't define it; default is same value as installed cl.exe, or 1933)  -fms-compatibility      Enable full Microsoft Visual C++ compatibility  -fms-extensions         Accept some non-standard constructs supported by the Microsoft compiler  -fmsc-version=<value>   Microsoft compiler version number to report in _MSC_VER                          (0 = don't define it; default is same value as installed cl.exe, or 1933)  -fno-addrsig            Don't emit an address-significance table  -fno-builtin-<value>    Disable implicit builtin knowledge of a specific function  -fno-builtin            Disable implicit builtin knowledge of functions  -fno-complete-member-pointers                          Do not require member pointer base types to be complete if they would be significant under the Microsoft ABI  -fno-coverage-mapping   Disable code coverage analysis  -fno-crash-diagnostics  Disable auto-generation of preprocessed source files and a script for reproduction during a clang crash  -fno-debug-macro        Do not emit macro debug information  -fno-delayed-template-parsing                          Disable delayed template parsing  -fno-sanitize-address-poison-custom-array-cookie                          Disable poisoning array cookies when using custom operator new[] in AddressSanitizer  -fno-sanitize-address-use-after-scope                          Disable use-after-scope detection in AddressSanitizer  -fno-sanitize-address-use-odr-indicator                           Disable ODR indicator globals  -fno-sanitize-ignorelist Don't use ignorelist file for sanitizers  -fno-sanitize-cfi-cross-dso                          Disable control flow integrity (CFI) checks for cross-DSO calls.  -fno-sanitize-coverage=<value>                          Disable specified features of coverage instrumentation for Sanitizers  -fno-sanitize-memory-track-origins                          Disable origins tracking in MemorySanitizer  -fno-sanitize-memory-use-after-dtor                          Disable use-after-destroy detection in MemorySanitizer  -fno-sanitize-recover=<value>                          Disable recovery for specified sanitizers  -fno-sanitize-stats     Disable sanitizer statistics gathering.  -fno-sanitize-thread-atomics                          Disable atomic operations instrumentation in ThreadSanitizer  -fno-sanitize-thread-func-entry-exit                          Disable function entry/exit instrumentation in ThreadSanitizer  -fno-sanitize-thread-memory-access                          Disable memory access instrumentation in ThreadSanitizer  -fno-sanitize-trap=<value>                          Disable trapping for specified sanitizers  -fno-standalone-debug   Limit debug information produced to reduce size of debug binary  -fno-strict-aliasing    Disable optimizations based on strict aliasing rules (default)  -fobjc-runtime=<value>  Specify the target Objective-C runtime kind and version  -fprofile-exclude-files=<value>                          Instrument only functions from files where names don't match all the regexes separated by a semi-colon  -fprofile-filter-files=<value>                          Instrument only functions from files where names match any regex separated by a semi-colon  -fprofile-generate=<dirname>                          Generate instrumented code to collect execution counts into a raw profile file in the directory specified by the argument. The filename uses default_%m.profraw pattern                          (overridden by LLVM_PROFILE_FILE env var)  -fprofile-generate                          Generate instrumented code to collect execution counts into default_%m.profraw file                          (overridden by '=' form of option or LLVM_PROFILE_FILE env var)  -fprofile-instr-generate=<file_name_pattern>                          Generate instrumented code to collect execution counts into the file whose name pattern is specified as the argument                          (overridden by LLVM_PROFILE_FILE env var)  -fprofile-instr-generate                          Generate instrumented code to collect execution counts into default.profraw file                          (overridden by '=' form of option or LLVM_PROFILE_FILE env var)  -fprofile-instr-use=<value>                          Use instrumentation data for coverage testing or profile-guided optimization  -fprofile-use=<value>                          Use instrumentation data for profile-guided optimization  -fprofile-remapping-file=<file>                          Use the remappings described in <file> to match the profile data against names in the program  -fprofile-list=<file>                          Filename defining the list of functions/files to instrument  -fsanitize-address-field-padding=<value>                          Level of field padding for AddressSanitizer  -fsanitize-address-globals-dead-stripping                          Enable linker dead stripping of globals in AddressSanitizer  -fsanitize-address-poison-custom-array-cookie                          Enable poisoning array cookies when using custom operator new[] in AddressSanitizer  -fsanitize-address-use-after-return=<mode>                          Select the mode of detecting stack use-after-return in AddressSanitizer: never | runtime (default) | always  -fsanitize-address-use-after-scope                          Enable use-after-scope detection in AddressSanitizer  -fsanitize-address-use-odr-indicator                          Enable ODR indicator globals to avoid false ODR violation reports in partially sanitized programs at the cost of an increase in binary size  -fsanitize-ignorelist=<value>                          Path to ignorelist file for sanitizers  -fsanitize-cfi-cross-dso                          Enable control flow integrity (CFI) checks for cross-DSO calls.  -fsanitize-cfi-icall-generalize-pointers                          Generalize pointers in CFI indirect call type signature checks  -fsanitize-coverage=<value>                          Specify the type of coverage instrumentation for Sanitizers  -fsanitize-hwaddress-abi=<value>                          Select the HWAddressSanitizer ABI to target (interceptor or platform, default interceptor)  -fsanitize-memory-track-origins=<value>                          Enable origins tracking in MemorySanitizer  -fsanitize-memory-track-origins                          Enable origins tracking in MemorySanitizer  -fsanitize-memory-use-after-dtor                          Enable use-after-destroy detection in MemorySanitizer  -fsanitize-recover=<value>                          Enable recovery for specified sanitizers  -fsanitize-stats        Enable sanitizer statistics gathering.  -fsanitize-thread-atomics                          Enable atomic operations instrumentation in ThreadSanitizer (default)  -fsanitize-thread-func-entry-exit                          Enable function entry/exit instrumentation in ThreadSanitizer (default)  -fsanitize-thread-memory-access                          Enable memory access instrumentation in ThreadSanitizer (default)  -fsanitize-trap=<value> Enable trapping for specified sanitizers  -fsanitize-undefined-strip-path-components=<number>                          Strip (or keep only, if negative) a given number of path components when emitting check metadata.  -fsanitize=<check>      Turn on runtime checks for various forms of undefined or suspicious                          behavior. See user manual for available checks  -fsplit-lto-unit        Enables splitting of the LTO unit.  -fstandalone-debug      Emit full debug info for all types used by the program  -fstrict-aliasing       Enable optimizations based on strict aliasing rules  -fsyntax-only           Run the preprocessor, parser and semantic analysis stages  -fwhole-program-vtables Enables whole-program vtable optimization. Requires -flto  -gcodeview-ghash        Emit type record hashes in a .debug$H section  -gcodeview              Generate CodeView debug information  -gline-directives-only  Emit debug line info directives only  -gline-tables-only      Emit debug line number tables only  -miamcu                 Use Intel MCU ABI  -mllvm <value>          Additional arguments to forward to LLVM's option processing  -nobuiltininc           Disable builtin #include directories  -Qunused-arguments      Don't emit warning for unused driver arguments  -R<remark>              Enable the specified remark  --target=<value>        Generate code for the given target  --version               Print version information  -v                      Show commands to run and use verbose output  -W<warning>             Enable the specified warning  -Xclang <arg>           Pass <arg> to the clang compiler  -Xclangas <arg>         Pass <arg> to the clang assembler

The /clang: Option

When clang-cl is run with a set of/clang:<arg> options, it will gather allof the<arg> arguments and process them as if they were passed to the clangdriver. This mechanism allows you to pass flags that are not exposed in theclang-cl options or flags that have a different meaning when passed to the clangdriver. Regardless of where they appear in the command line, the/clang:arguments are treated as if they were passed at the end of the clang-cl commandline.

The /Zc:dllexportInlines- Option

This causes the class-leveldllexport anddllimport attributes to not applyto inline member functions, as they otherwise would. For example, in the codebelowS::foo() would normally be defined and exported by the DLL, but whenusing the/Zc:dllexportInlines- flag it is not:

struct__declspec(dllexport)S{voidfoo(){}}

This has the benefit that the compiler doesn’t need to emit a definition ofS::foo() in every translation unit where the declaration is included, as itwould otherwise do to ensure there’s a definition in the DLL even if it’s notused there. If the declaration occurs in a header file that’s widely used, thiscan save significant compilation time and output size. It also reduces thenumber of functions exported by the DLL similarly to what-fvisibility-inlines-hidden does for shared objects on ELF and Mach-O.Since the function declaration comes with an inline definition, users of thelibrary can use that definition directly instead of importing it from the DLL.

Note that the Microsoft Visual C++ compiler does not support this option, andif code in a DLL is compiled with/Zc:dllexportInlines-, the code using theDLL must be compiled in the same way so that it doesn’t attempt to dllimportthe inline member functions. The reverse scenario should generally work though:a DLL compiled without this flag (such as a system library compiled with VisualC++) can be referenced from code compiled using the flag, meaning that thereferencing code will use the inline definitions instead of importing them fromthe DLL.

Also note that like when using-fvisibility-inlines-hidden, the address ofS::foo() will be different inside and outside the DLL, breaking the C/C++standard requirement that functions have a unique address.

The flag does not apply to explicit class template instantiation definitions ordeclarations, as those are typically used to explicitly provide a singledefinition in a DLL, (dllexported instantiation definition) or to signal thatthe definition is available elsewhere (dllimport instantiation declaration). Italso doesn’t apply to inline members with static local variables, to ensurethat the same instance of the variable is used inside and outside the DLL.

Using this flag can cause problems when inline functions that would otherwisebe dllexported refer to internal symbols of a DLL. For example:

voidinternal();struct__declspec(dllimport)S{voidfoo(){internal();}}

Normally, references toS::foo() would use the definition in the DLL fromwhich it was exported, and which presumably also has the definition ofinternal(). However, when using/Zc:dllexportInlines-, the inlinedefinition ofS::foo() is used directly, resulting in a link error sinceinternal() is not available. Even worse, if there is an inline definition ofinternal() containing a static local variable, we will now refer to adifferent instance of that variable than in the DLL:

inlineintinternal(){staticintx;returnx++;}struct__declspec(dllimport)S{intfoo(){returninternal();}}

This could lead to very subtle bugs. Using-fvisibility-inlines-hidden canlead to the same issue. To avoid it in this case, makeS::foo() orinternal() non-inline, or mark themdllimport/dllexport explicitly.

Finding Clang runtime libraries

clang-cl supports several features that require runtime library support:

  • Address Sanitizer (ASan):-fsanitize=address

  • Undefined Behavior Sanitizer (UBSan):-fsanitize=undefined

  • Code coverage:-fprofile-instr-generate-fcoverage-mapping

  • Profile Guided Optimization (PGO):-fprofile-generate

  • Certain math operations (int128 division) require the builtins library

In order to use these features, the user must link the right runtime librariesinto their program. These libraries are distributed alongside Clang in thelibrary resource directory. Clang searches for the resource directory bysearching relative to the Clang executable. For example, if LLVM is installedinC:\ProgramFiles\LLVM, then the profile runtime library will be locatedat the pathC:\ProgramFiles\LLVM\lib\clang\11.0.0\lib\windows\clang_rt.profile-x86_64.lib.

For UBSan, PGO, and coverage, Clang will emit object files that auto-link theappropriate runtime library, but the user generally needs to help the linker(whether it islld-link.exe or MSVClink.exe) find the library resourcedirectory. Using the example installation above, this would mean passing/LIBPATH:C:\ProgramFiles\LLVM\lib\clang\11.0.0\lib\windows to the linker.If the user links the program with theclang orclang-cl drivers, thedriver will pass this flag for them.

The auto-linking can be disabled with -fno-rtlib-defaultlib. If that flag isused, pass the complete flag to required libraries as described for ASan below.

If the linker cannot find the appropriate library, it will emit an error likethis:

$ clang-cl -c -fsanitize=undefined t.cpp$ lld-link t.obj -dlllld-link: error: could not open 'clang_rt.ubsan_standalone-x86_64.lib': no such file or directorylld-link: error: could not open 'clang_rt.ubsan_standalone_cxx-x86_64.lib': no such file or directory$ link t.obj -dll -nologoLINK : fatal error LNK1104: cannot open file 'clang_rt.ubsan_standalone-x86_64.lib'

To fix the error, add the appropriate/libpath: flag to the link line.

For ASan, as of this writing, the user is also responsible for linking againstthe correct ASan libraries.

If the user is using the dynamic CRT (/MD), then they should addclang_rt.asan_dynamic-x86_64.lib to the link line as a regular input. Forother architectures, replace x86_64 with the appropriate name here and below.

If the user is using the static CRT (/MT), then different runtimes are usedto produce DLLs and EXEs. To link a DLL, passclang_rt.asan_dll_thunk-x86_64.lib. To link an EXE, pass-wholearchive:clang_rt.asan-x86_64.lib.

Windows System Headers and Library Lookup

clang-cl uses a set of different approaches to locate the right system librariesto link against when building code. The Windows environment uses libraries fromthree distinct sources:

  1. Windows SDK

  2. UCRT (Universal C Runtime)

  3. Visual C++ Tools (VCRuntime)

The Windows SDK provides the import libraries and headers required to buildprograms against the Windows system packages. Underlying the Windows SDK is theUCRT, the universal C runtime.

This difference is best illustrated by the various headers that one would findin the different categories. The WinSDK would contain headers such asWinSock2.h which is part of the Windows API surface, providing the Windowssocketing interfaces for networking. UCRT provides the C library headers,including e.g.stdio.h. Finally, the Visual C++ tools provides the underlyingVisual C++ Runtime headers such asstdint.h orcrtdefs.h.

There are various controls that allow the user control over where clang-cl willlocate these headers. The default behaviour for the Windows SDK and UCRT is asfollows:

  1. Consult the command line.

    Anything the user specifies is always given precedence. The followingextensions are part of the clang-cl toolset:

    • /winsysroot:

    The/winsysroot: is used as an equivalent to-sysroot on Unixenvironments. It allows the control of an alternate location to be treatedas a system root. When specified, it will be used as the root where theWindows Kits is located.

    • /winsdkversion:

    • /winsdkdir:

    If/winsysroot: is not specified, the/winsdkdir: argument is consultedas a location to identify where the Windows SDK is located. Contrary to/winsysroot:,/winsdkdir: is expected to be the complete path ratherthan a root to locateWindows Kits.

    The/winsdkversion: flag allows the user to specify a version identifierfor the SDK to prefer. When this is specified, no additional validation isperformed and this version is preferred. If the version is not specified,the highest detected version number will be used.

  2. Consult the environment.

    TODO: This is not yet implemented.

    This will consult the environment variables:

    • WindowsSdkDir

    • UCRTVersion

  3. Fallback to the registry.

    If no arguments are used to indicate where the SDK is present, and thecompiler is running on Windows, the registry is consulted to locate theinstallation.

The Visual C++ Toolset has a slightly more elaborate mechanism for detection.

  1. Consult the command line.

    • /winsysroot:

    The/winsysroot: is used as an equivalent to-sysroot on Unixenvironments. It allows the control of an alternate location to be treatedas a system root. When specified, it will be used as the root where theVC directory is located.

    • /vctoolsdir:

    • /vctoolsversion:

    If/winsysroot: is not specified, the/vctoolsdir: argument is consultedas a location to identify where the Visual C++ Tools are located. If/vctoolsversion: is specified, that version is preferred, otherwise, thehighest version detected is used.

  2. Consult the environment.

    • /external:[VARIABLE]

      This specifies a user identified environment variable which is treated asa path delimiter (;) separated list of paths to map into-imsvcarguments which are treated as-isystem.

    • INCLUDE andEXTERNAL_INCLUDE

      The path delimiter (;) separated list of paths will be mapped to-imsvc arguments which are treated as-isystem.

    • LIB (indirectly)

      The linkerlink.exe orlld-link.exe will honour the environmentvariableLIB which is a path delimiter (;) set of paths to consult forthe import libraries to use when linking the final target.

    The following environment variables will be consulted and used to form pathsto validate and load content from as appropriate:

    • VCToolsInstallDir

    • VCINSTALLDIR

    • Path

  3. ConsultISetupConfiguration [Windows Only]

    Assuming that the toolchain is built withUSE_MSVC_SETUP_API defined andis running on Windows, the Visual Studio COM interfaceISetupConfigurationwill be used to locate the installation of the MSVC toolset.

  4. Fallback to the registry [DEPRECATED]

    The registry information is used to help locate the installation as a finalfallback. This is only possible for pre-VS2017 installations and isconsidered deprecated.

Restrictions and Limitations compared to Clang

Strict aliasing (TBAA) is always off by default in clang-cl whereas in clang,strict aliasing is turned on by default for all optimization levels. For moredetails, seeStrict aliasing.