Code Optimizations
Code optimizations are inCompiler/Optimize/*
.
Some of the optimizations performed inOptimizer.fs
are:
- Propagation of known values (constants, x = y, lambdas, tuples/records/union-cases of known values)
- Inlining of known lambda values
- Eliminating unused bindings
- Eliminating sequential code when there is no side-effect
- Eliminating switches when we determine definite success or failure of pattern matching
- Eliminating getting fields from an immutable record/tuple/union-case of known value
- Expansion of tuple bindings "let v = (x1,...x3)" to avoid allocations if it's not used as a first class value
- Splitting large functions into multiple methods, especially at match cases, to avoid massive methods that take a long time to JIT
- Removing tailcalls when it is determined that no code in the transitive closure does a tailcall nor recurses
InDetupleArgs.fs
, tuples at call sites are eliminated if possible. Concretely, functions that accept a tuple at all call sites are replaced by functions that accept each of the tuple's arguments individually. This may require inlining to activate.
Considering the following example:
letmax3t=let(x,y,z)=tmaxx(maxyz)max3(1,2,3)
Themax3
function gets rewritten to simply accept three arguments, and depending on how it gets called it will either get inlined at the call site or called with 3 arguments instead of a new tuple. In either case, the tuple allocation is eliminated.
However, sometimes this optimization is not applied unless a function is markedinline
. Consider a more complicated case:
letrecrunWithTupletoffsettimes=letoffsetValuesxyzoffset=(x+offset,y+offset,z+offset)iftimes<=0thentelselet(x,y,z)=tletr=offsetValuesxyzoffsetrunWithTupleroffset(times-1)
The inner functionoffsetValues
will allocate a new tuple when called. However, ifoffsetValues
is marked asinline
then it will no longer allocate a tuple.
Currently, these optimizations are not applied tostruct
tuples or explicitValueTuple
s passed to a function. In most cases, this doesn't matter because the handling ofValueTuple
is well-optimized and may be erased at runtime. However, in the previousrunWithTuple
function, the overhead of allocating aValueTuple
each call ends up being higher than the previous example withinline
applied tooffsetValues
. This may be addressed in the future.
InInnerLambdasToTopLevelFuncs.fs
, inner functions and lambdas are analyzed and, if possible, rewritten into separate methods that do not require anFSharpFunc
allocation.
Consider the following implementation ofsumBy
on an F# list:
letsumByfxs=letrecloopxsacc=matchxswith|[]->acc|x::t->loopt(fx+acc)loopxs0
The innerloop
function is emitted as a separate static method namedloop@2
and incurs no overhead involved with allocating anFSharpFunc
at runtime.
- Performs eta-expansion on under-applied values applied to lambda expressions and does a beta-reduction to bind any known arguments
- Analyzes a sequence expression and translates it into a state machine so that operating on sequences doesn't incur significant closure overhead
Potential future optimizations: Better Inlining
Consider the following example:
letinlinefk=(funx->k(x+1))letinlinegk=(funx->k(x+2))letres=(f<<g)id1// 4
Intermediate values that inherit fromFSharpFunc
are allocated at the call set ofres
to support function composition, even if the functions are marked asinline
. Currently, if this overhead needs removal, you need to rewrite the code to be more like this:
letfx=x+1letgx=x+2letres=id1|>g|>f// 4
The downside of course being that theid
function can't propagate to composed functions, meaning the code is now different despite yielding the same result.
More generally, any time a first-order function is passed as an argument to a second-order function, the first-order function is not inlined even if everything is marked asinline
. This results in a performance penalty.