Movatterモバイル変換

Jump to content

Inline expansion

From Wikipedia, the free encyclopedia

Optimization replacing a function call with that function's source code

This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "Inline expansion" – news ·newspapers ·books ·scholar ·JSTOR(December 2013) (Learn how and when to remove this message)

Incomputing,inline expansion, orinlining, is a manual orcompiler optimization that replaces a functioncall site with the body of the called function. Inline expansion is similar tomacro expansion, but occurs during compiling, without changing thesource code (the text), while macro expansion occurs before compiling, and results in different text that is then processed by thecompiler.

Inlining is an important optimization, but has complex effects on performance.^[1] As arule of thumb, some inlining will improve speed at very minor cost of space, but excess inlining will hurt speed, due to inlined code consuming too much of theinstruction cache, and also cost significant space. A survey of the modest academic literature on inlining from the 1980s and 1990s is given in Peyton Jones & Marlow 1999.^[2]

Overview

Inline expansion is similar to macro expansion as the compiler places a new copy of the function in each place it is called. Inlined functions run a little faster than the normal functions as function-calling-overheads are saved, however, there is a memory penalty. If a function is inlined 10 times, there will be 10 copies of the function inserted into the code. Hence inlining is best for small functions that are called often. In C++ the member functions of a class, if defined within the class definition, are inlined by default (no need to use theinlinereserved word (keyword)); otherwise, the keyword is needed. The compiler may ignore the programmer’s attempt to inline a function, mainly if it is particularly large.

Inline expansion is used to eliminate the time overhead (excess time) when a function is called. It is typically used for functions that execute frequently. It also has a space benefit for very small functions, and is an enabling transformation for otheroptimizations.

Without inline functions, thecompiler decides which functions to inline. The programmer has little or no control over which functions are inlined and which are not. Giving this degree of control to the programmer allows for the use of application-specific knowledge in choosing which functions to inline.

Ordinarily, when a function is invoked,control is transferred to its definition by abranch or call instruction. With inlining, control drops through directly to the code for the function, without a branch or call instruction.

Compilers usually implementstatements with inlining. Loop conditions and loop bodies needlazy evaluation. This property is fulfilled when the code to compute loop conditions and loop bodies is inlined. Performance considerations are another reason to inline statements.

In the context offunctional programming languages, inline expansion is usually followed by thebeta-reduction transformation.^[3]

A programmer might inline a function manually throughcopy-and-paste programming, as a one-time operation on thesource code. However, other methods of controlling inlining (see below) are preferable, because they do not precipitate bugs arising when the programmer overlooks a (possibly modified) duplicated version of the original function body, while fixing a bug in the inlined function.

Effect on performance

The direct effect of this optimization is to improve time performance (by eliminating call overhead), at the cost of worsening space usage^[a] (due toduplicating the function body). The code expansion due to duplicating the function body dominates, except for simple cases,^[b] and thus the direct effect of inline expansion is to improve time at the cost of space.

However, the main benefit of inline expansion is to allow further optimizations and improved scheduling, due to increasing the size of the function body, as better optimization is possible on larger functions.^[4] The ultimate impact of inline expansion on speed is complex, due to multiple effects on performance of the memory system (mainlyinstruction cache), which dominates performance on modern processors: depending on the specific program and cache, inlining particular functions can increase or decrease performance.^[1]

The impact of inlining varies byprogramming language and program, due to different degrees of abstraction. In lower-level imperative languages such asC andFortran it is typically a 10–20% speed boost, with minor impact on code size, while in more abstract languages it can be significantly more important, due to the number of layers inlining removes, with an extreme example beingSelf, where one compiler saw improvement factors of 4 to 55 by inlining.^[2]

The direct benefits of eliminating a function call are:

It eliminates instructions needed for afunction call, both in the calling function and in the callee: placing arguments on astack or inregisters, the function call itself, thefunction prologue, then at return thefunction epilogue, thereturn statement, and then getting the return value back, and removing arguments from stacks and restoring registers (if needed).
Due to not needing registers to pass arguments, it reducesregister spilling.
It eliminates having to pass references and then dereference them, when usingcall by reference (orcall by address, orcall by sharing).

The main benefit of inlining, however, is the further optimizations it allows. Optimizations that cross function boundaries can be done without requiringinterprocedural optimization (IPO): once inlining has been performed, addedintraprocedural optimizations ("global optimizations") become possible on the enlarged function body. For example:

Aconstant passed as an argument can often be propagated to all instances of the matching parameter, or part of the function may be "hoisted out" of a loop (vialoop-invariant code motion).
Register allocation can be done across the larger function body.
High-level optimizations, such asescape analysis andtail duplication, can be performed on a larger scope and be more effective, more so if the compiler implementing those optimizations relies on mainly intra-procedural analysis.^[5] These can be done without inlining, but require a significantly more complex compiler and linker (in case caller and callee are in separate compiling units).

Conversely, in some cases a language specification may allow a program to make added assumptions about arguments to procedures that it can no longer make after the procedure is inlined, preventing some optimizations. Smarter compilers (such asGlasgow Haskell Compiler (GHC)) will track this, but naive inlining loses this information.

A further benefit of inlining for the memory system is:

Eliminating branches and keeping code that is executed close together in memory improves instruction cache performance by improvinglocality of reference (spatial locality and sequentiality of instructions). This is smaller than optimizations that specifically target sequentiality, but is significant.^[6]

The direct cost of inlining is increased code size, due to duplicating the function body at each call site. However, it does not always do so, namely in case of very short functions, where the function body is smaller than the size of a function call (at the caller, including argument and return value handling), such as trivialaccessor methods ormutator methods (getters and setters); or for a function that is only used in one place, in which case it is not duplicated. Thus inlining may be minimized or eliminated if optimizing for code size, as is often the case inembedded systems.

Inlining also imposes a cost on performance, due to the code expansion (due to duplication) hurting instruction cache performance.^[7] This is most significant if, before expansion, theworking set of the program (or a hot section of code) fit in one level of the memory hierarchy (e.g.,L1 cache), but after expansion it no longer fits, resulting in frequent cache misses at that level. Due to the significant difference in performance at different levels of the hierarchy, this hurts performance considerably. At the highest level this can result in increasedpage faults, catastrophic performance degradation due tothrashing, or the program failing to run at all. This last is rare in common desktop and server applications, where code size is small relative to available memory, but can be an issue for resource-constrained environments such as embedded systems. One way to mitigate this problem is to split functions into a smaller hot inline path (fast path), and a larger cold non-inline path (slow path).^[7]

Inlining hurting performance is a problem for mainly large functions that are used in many places, but the break-even point beyond which inlining reduces performance is difficult to determine and depends in general on precise load, so it can be subject to manual optimization orprofile-guided optimization.^[8] This is a similar issue to other code expanding optimizations such asloop unrolling, which also reduces number of instructions processed, but can decrease performance due to poorer cache performance.

The precise effect of inlining on cache performance is complex. For small cache sizes (much smaller than the working set before expansion), the increased sequentiality dominates, and inlining improves cache performance. For cache sizes close to the working set, where inlining expands the working set so it no longer fits in cache, this dominates and cache performance decreases. For cache sizes larger than the working set, inlining has negligible impact on cache performance. Further, changes in cache design, such asload forwarding, can offset the increase in cache misses.^[9]

Compiler support

Compilers use a variety of mechanisms to decide which function calls should be inlined; these can include manual hints from programmers for specific functions, together with overall control viacommand-line options. Inlining is done automatically by many compilers in many languages, based on judgment of whether inlining is beneficial, while in other cases it can be manually specified via compilerdirectives, typically using a keyword orcompiler directive calledinline. Typically this only hints that inlining is desired, rather than requiring inlining, with the force of the hint varying by language and compiler.

Typically, compiler developers keep the above performance issues in mind, and incorporateheuristics into their compilers that choose which functions to inline so as to improve performance, rather than worsening it, in most cases.

Implementation

Once thecompiler has decided to inline a particular function, performing the inlining operation itself is usually simple. Depending on whether a compiler inlines functions across code in different languages, the compiler can inline on either a high-levelintermediate representation (likeabstract syntax trees) or a low-level intermediate representation. In either case, the compiler simply computes thearguments, stores them in variables corresponding to the function's arguments, and then inserts the body of the function at the call site.

Linkers can also do function inlining. When a linker inlines functions, it may inline functions whose source is not available, such as library functions (seelink-time optimization). Aruntime system can inline a function also.Runtime inlining can use dynamic profiling information to make better decisions about which functions to inline, as in theJava HotSpot compiler.^[10]

Here is a simple example of inline expansion performed "by hand" at the source level in theC language:

intpred(intx){if(x==0){return0;}else{returnx-1;}}

Before inlining:

intfunc(inty){returnpred(y)+pred(0)+pred(y+1);}

After inlining:

intfunc(inty){inttmp;// (1)if(y==0){tmp=0;}else{tmp=y-1;}// (2)if(0==0){tmp+=0;}else{tmp+=0-1;}// (3)if(y+1==0){tmp+=0;}else{tmp+=(y+1)-1;}returntmp;}

Note that this is only an example. In an actual C application, it would be preferable to use an inlining language feature such asparameterized macros orinline functions to tell the compiler to transform the code in this way. The next section lists ways to optimize this code.

Inlining by assembly macro expansion

Assembler macros provide an alternative approach to inlining whereby a sequence of instructions can normally be generated inline by macro expansion from a single macro source statement (with zero or more parameters). One of the parameters might be an option to alternatively generate a one-time separatesubroutine containing the sequence and processed instead by an inlined call to the function.Example:

MOVE FROM=array1,TO=array2,INLINE=NO

Heuristics

A range of different heuristics have been explored for inlining. Usually, an inlining algorithm has a certain code budget (an allowed increase in program size) and aims to inline the most valuable callsites without exceeding that budget. In this sense, many inlining algorithms are usually modeled after theKnapsack problem.^[11] To decide which callsites are more valuable, an inlining algorithm must estimate their benefit—i.e. the expected decrease in the execution time. Commonly, inliners use profiling information about the frequency of the execution of different code paths to estimate the benefits.^[12]

In addition to profiling information, newerjust-in-time compilers apply several more advanced heuristics, such as:^[5]

Speculating which code paths will result in the best reduction in execution time (by enabling additional compiler optimizations as a result of inlining) and increasing the perceived benefit of such paths.
Adaptively adjusting the benefit-per-cost threshold for inlining based on the size of the compiling unit and the amount of code already inlined.
Grouping subroutines into clusters, and inlining entire clusters instead of singular subroutines. Here, the heuristic guesses the clusters by grouping those methods for which inlining just a proper subset of the cluster leads to a worse performance than inlining nothing at all.

Benefits

Inline expansion itself is an optimization, since it eliminates overhead from calls, but it is much more important as anenabling transformation. That is, once the compiler expands a function body in the context of its call site—often with arguments that may be fixedconstants—it may be able to do a variety of transformations that were not possible before. For example, aconditional branch may turn out to be always true or always false at this particular call site. This in turn may enabledead code elimination,loop-invariant code motion, orinduction variable elimination.

In the C example in the prior section, optimizing opportunities abound. The compiler may follow this sequence of steps:

Thetmp += 0 statements in the lines marked (2) and (3) do nothing. The compiler can remove them.
The condition0 == 0 is always true, so the compiler can replace the line marked (2) with the consequent,tmp += 0 (which does nothing).
The compiler can rewrite the conditiony+1 == 0 toy == -1.
The compiler can reduce the expression(y + 1) - 1 toy.
The expressionsy andy+1 cannot both equal zero. This lets the compiler eliminate one test.
In statements such asif (y == 0) return y the value ofy is known in the body, and can be inlined.

The new function looks like:

intfunc(inty){if(y==0){return0;}if(y==-1){return-2;}return2*y-1;}

Limits

Complete inline expansion is not always possible, due torecursion: recursively inline expanding the calls will not terminate. There are various solutions, such as expanding a bounded amount, or analyzing thecall graph and breaking loops at certain nodes (i.e., not expanding some edge in a recursive loop).^[13] An identical problem occurs in macro expansion, as recursive expansion does not terminate, and is typically resolved by forbidding recursive macros (as in C and C++).

Comparison with macros

Traditionally, in languages such asC, inline expansion was accomplished at the source level usingparameterized macros. Use of true inline functions, as are available inC99, provides several benefits over this approach:

In C, macro invocations do not performtype checking, or even check that arguments are well-formed, whereas function calls usually do.
In C, a macro cannot use the return keyword with the same meaning as a function would do (it would make the function that asked the expansion terminate, rather than the macro). In other words, a macro cannot return anything which is not the result of the last expression invoked inside it.
Since C macros use mere textual substitution, this may result in unintended side-effects and inefficiency due to re-evaluation of arguments andorder of operations.
Compiler errors within macros are often difficult to understand, because they refer to the expanded code, rather than the code the programmer typed. Thus, debugging information for inlined code is usually more helpful than that of macro-expanded code.
Many constructs are awkward or impossible to express using macros, or use a significantly different syntax. Inline functions use the same syntax as regular functions, and can be inlined and un-inlined at will with ease.

Many compilers can also inline expand somerecursive functions;^[14] recursive macros are typically illegal.

Bjarne Stroustrup, the designer of C++, likes to emphasize that macros should be avoided wherever possible, and advocates extensive use of inline functions.

Selection methods

Many compilers aggressively inline functions wherever it is beneficial to do so. Although it can lead to largerexecutables, aggressive inlining has nevertheless become more and more desirable as memory capacity has increased faster than CPU speed. Inlining is a critical optimization in languages forfunctional andobject-oriented programming, which rely on it to provide enough context for their typically small functions to make classical optimizations effective.

Language support

Many languages, includingJava and functional languages, do not provide language constructs for inline functions, but their compilers orinterpreters often perform aggressive inline expansion.^[5] Other languages provide constructs for explicit hints, generally as compilerdirectives (pragmas).

The languageAda has a pragma for inline functions.

Functions inCommon Lisp may be defined as inline by theinline declaration as such:^[15]

(declaim(inlinedispatch))(defundispatch(x)(funcall(get(carx)'dispatch)x))

TheHaskell compilerGHC tries to inline functions or values that are small enough but inlining may be noted explicitly using a language pragma:^[16]

key_function::Int->String->(Bool,Double){-# INLINE key_function #-}

C and C++

Further information:Inline function

C andC++ have aninline keyword which serves as a hint that inlining may be beneficial; however, in newer versions, its main purpose is instead to alter the visibility and linking behavior of the function.^[17]

In C++, amethod of aclass that is defined inside the class body will implicitly be inlined.

Kotlin

InKotlin, aninline function will be inlined, and can be signaled not to be inlined usingnoinline.inline copies the function bytecode into the call site, inlines lambda arguments, and eliminates function call and lambda allocation overhead.^[18] In equivalentJava (orJava bytecode), the function will be represented as logic at the call site.

Rust

InRust, inlining is automatically done by the compiler.^[19] Rust provides an#[inline] attribute that suggests to the compiler that a function should be inlined, but does not guarantee it; the compiler may ignore even#[inline(always)]. In debug mode, the compiler will never inline.^[20]

See also

Notes

^Space usage is "number of instructions", and is both runtime space usage and thebinary file size.
^Code size actually shrinks for very short functions, where the call overhead is larger than the body of the function, or single-use functions, where no duplication occurs.

References

^^a ^bChen et al. 1993.
^^a ^bPeyton Jones & Marlow 1999, 8. Related work, p. 17.
^Jones, Simon Peyton; Marlow, Simon (July 2002)."Secrets of the Glasgow Haskell Compiler inliner".Journal of Functional Programming.12 (4–5):393–434.doi:10.1017/S0956796802004331.ISSN 1469-7653.
^Chen et al. 1993, 3.4 Function inline expansion, p. 14.
^^a ^b ^c[1] Prokopec et al., An Optimization Driven Incremental Inline Substitution Algorithm for Just-In-Time Compilers, CGO'19 publication about the inliner used in the Graal compiler for the JVM
^Chen et al. 1993, 3.4 Function inline expansion, p. 19–20.
^^a ^bBenjamin Poulain (August 8, 2013)."Unusual speed boost: size matters".
^See for example theAdaptive Optimization System Archived 2011-08-09 at theWayback Machine in theJikes RVM for Java.
^Chen et al. 1993, 3.4 Function inline expansion, p. 24–26.
^[2] Description of the inliner used in the Graal JIT compiler for Java
^[3] Scheifler, An Analysis of Inline Substitution for a Structured Programming Language
^[4] Matthew Arnold, Stephen Fink, Vivek Sarkar, and Peter F. Sweeney, A Comparative Study of Static and Profile-based Heuristics for Inlining
^Peyton Jones & Marlow 1999, 4. Ensuring Termination, pp. 6–9.
^Inlining Semantics for Subroutines which are Recursive" by Henry G. Baker
^DeclarationINLINE,NOTINLINE at theCommon Lisp HyperSpec
^7.13.5.1. INLINE pragma Chapter 7. GHC Language Features
^"inline specifier - cppreference.com".en.cppreference.com. Retrieved2026-01-10.
^"Inline functions".kotlinlang.org. JetBrains s.r.o. 23 June 2025.
^"Code generation - The Rust Reference".doc.rust-lang.org. Retrieved2025-05-01.
^"When to #[inline] - Standard library developers Guide".std-dev-guide.rust-lang.org. Retrieved2025-05-01.

Chen, W. Y.; Chang, P. P.; Conte, T. M.; Hwu, W. W. (September 1993)."The effect of code expanding optimizations on instruction cache design"(PDF).IEEE Transactions on Computers.42 (9):1045–1057.Bibcode:1993ITCmp..42.1045C.doi:10.1109/12.241594.hdl:2142/74513.
Peyton Jones, Simon;Marlow, Simon (September 1999).Secrets of the Glasgow Haskell Compiler Inliner (Technical report).

External links

Look upin-line expansion orinlining in Wiktionary, the free dictionary.

"Eliminating Virtual Function Calls in C++ Programs"; Gerald Aigner,Urs Hölzle
"Reducing Indirect Function Call Overhead In C++ Programs"; Brad Calder, Dirk Grumwald
ALTO - A Link-Time Optimizer for the DEC Alpha
"Advanced techniques";John R. Levine
"Whole Program Optimization with Visual C++ .NET"; Brandon Bray

v t e Compiler optimizations
Basic block	Peephole optimization Local value numbering
Loop	Automatic parallelization Automatic vectorization Induction variable Loop fusion Loop-invariant code motion Loop inversion Loop interchange Loop nest optimization Loop splitting Loop unrolling Loop unswitching Software pipelining Strength reduction
Data-flow analysis	Available expression Common subexpression elimination Constant folding Dead store elimination Induction variable recognition and elimination Live-variable analysis Upwards exposed uses Use-define chain Reaching definitions
SSA-based	Global value numbering Sparse conditional constant propagation
Code generation	Instruction scheduling Instruction selection Register allocation Rematerialization
Functional	Deforestation Tail-call elimination
Global	Interprocedural optimization
Other	Bounds-checking elimination Compile-time function execution Dead-code elimination Expression templates Inline expansion Jump threading Partial evaluation Profile-guided optimization
Static analysis	Alias analysis Array-access analysis Control-flow analysis Data-flow analysis Dependence analysis Escape analysis Pointer analysis Shape analysis Value range analysis

Retrieved from "https://en.wikipedia.org/w/index.php?title=Inline_expansion&oldid=1334572897"

Hidden categories:

[8]ページ先頭

©2009-2026 Movatter.jp