Yul
Yul (previously also called JULIA or IULIA) is an intermediate language that can becompiled to bytecode for different backends.
Support for EVM 1.0, EVM 1.5 and Ewasm is planned, and it is designed tobe a usable common denominator of all threeplatforms. It can already be used in stand-alone mode andfor “inline assembly” inside Solidityand there is an experimental implementation of the Solidity compilerthat uses Yul as an intermediate language. Yul is a good target forhigh-level optimisation stages that can benefit all target platforms equally.
Motivation and High-level Description
The design of Yul tries to achieve several goals:
Programs written in Yul should be readable, even if the code is generated by a compiler from Solidity or another high-level language.
Control flow should be easy to understand to help in manual inspection, formal verification and optimization.
The translation from Yul to bytecode should be as straightforward as possible.
Yul should be suitable for whole-program optimization.
In order to achieve the first and second goal, Yul provides high-level constructslikefor
loops,if
andswitch
statements and function calls. These shouldbe sufficient for adequately representing the control flow for assembly programs.Therefore, no explicit statements forSWAP
,DUP
,JUMPDEST
,JUMP
andJUMPI
are provided, because the first two obfuscate the data flowand the last two obfuscate control flow. Furthermore, functional statements ofthe formmul(add(x,y),7)
are preferred over pure opcode statements like7yxaddmul
because in the first form, it is much easier to see whichoperand is used for which opcode.
Even though it was designed for stack machines, Yul does not expose the complexity of the stack itself.The programmer or auditor should not have to worry about the stack.
The third goal is achieved by compiling thehigher level constructs to bytecode in a very regular way.The only non-local operation performedby the assembler is name lookup of user-defined identifiers (functions, variables, …)and cleanup of local variables from the stack.
To avoid confusions between concepts like values and references,Yul is statically typed. At the same time, there is a default type(usually the integer word of the target machine) that can alwaysbe omitted to help readability.
To keep the language simple and flexible, Yul does not haveany built-in operations, functions or types in its pure form.These are added together with their semantics when specifying a dialect of Yul,which allows specializing Yul to the requirements of differenttarget platforms and feature sets.
Currently, there is only one specified dialect of Yul. This dialect usesthe EVM opcodes as builtin functions(see below) and defines only the typeu256
, which is the native 256-bittype of the EVM. Because of that, we will not provide types in the examples below.
Simple Example
The following example program is written in the EVM dialect and computes exponentiation.It can be compiled usingsolc--strict-assembly
. The builtin functionsmul
anddiv
compute product and division, respectively.
{functionpower(base,exponent)->result{switchexponentcase0{result:=1}case1{result:=base}default{result:=power(mul(base,base),div(exponent,2))switchmod(exponent,2)case1{result:=mul(base,result)}}}}
It is also possible to implement the same function using a for-loopinstead of with recursion. Here,lt(a,b)
computes whethera
is less thanb
.less-than comparison.
{functionpower(base,exponent)->result{result:=1for{leti:=0}lt(i,exponent){i:=add(i,1)}{result:=mul(result,base)}}}
At theend of the section, a complete implementation ofthe ERC-20 standard can be found.
Stand-Alone Usage
You can use Yul in its stand-alone form in the EVM dialect using the Solidity compiler.This will use theYul object notation so that it is possible to referto code as data to deploy contracts. This Yul mode is available for the commandline compiler(use--strict-assembly
) and for thestandard-json interface:
{"language":"Yul","sources":{"input.yul":{"content":"{ sstore(0, 1) }"}},"settings":{"outputSelection":{"*":{"*":["*"],"":["*"]}},"optimizer":{"enabled":true,"details":{"yul":true}}}}
Warning
Yul is in active development and bytecode generation is only fully implemented for the EVM dialect of Yulwith EVM 1.0 as target.
Informal Description of Yul
In the following, we will talk about each individual aspectof the Yul language. In examples, we will use the default EVM dialect.
Syntax
Yul parses comments, literals and identifiers in the same way as Solidity,so you can e.g. use//
and/**/
to denote comments.There is one exception: Identifiers in Yul can contain dots:.
.
Yul can specify “objects” that consist of code, data and sub-objects.Please seeYul Objects below for details on that.In this section, we are only concerned with the code part of such an object.This code part always consists of a curly-bracesdelimited block. Most tools support specifying just a code blockwhere an object is expected.
Inside a code block, the following elements can be used(see the later sections for more details):
literals, i.e.
0x123
,42
or"abc"
(strings up to 32 characters)calls to builtin functions, e.g.
add(1,mload(0))
variable declarations, e.g.
letx:=7
,letx:=add(y,3)
orletx
(initial value of 0 is assigned)identifiers (variables), e.g.
add(3,x)
assignments, e.g.
x:=add(y,3)
blocks where local variables are scoped inside, e.g.
{letx:=3{lety:=add(x,1)}}
if statements, e.g.
iflt(a,b){sstore(0,1)}
switch statements, e.g.
switchmload(0)case0{revert()}default{mstore(0,1)}
for loops, e.g.
for{leti:=0}lt(i,10){i:=add(i,1)}{mstore(i,7)}
function definitions, e.g.
functionf(a,b)->c{c:=add(a,b)}`
Multiple syntactical elements can follow each other simply separated bywhitespace, i.e. there is no terminating;
or newline required.
Literals
As literals, you can use:
Integer constants in decimal or hexadecimal notation.
ASCII strings (e.g.
"abc"
), which may contain hex escapes\xNN
and Unicode escapes\uNNNN
whereN
are hexadecimal digits.Hex strings (e.g.
hex"616263"
).
In the EVM dialect of Yul, literals represent 256-bit words as follows:
Decimal or hexadecimal constants must be less than
2**256
.They represent the 256-bit word with that value as an unsigned integer in big endian encoding.An ASCII string is first viewed as a byte sequence, by viewinga non-escape ASCII character as a single byte whose value is the ASCII code,an escape
\xNN
as single byte with that value, andan escape\uNNNN
as the UTF-8 sequence of bytes for that code point.The byte sequence must not exceed 32 bytes.The byte sequence is padded with zeros on the right to reach 32 bytes in length;in other words, the string is stored left-aligned.The padded byte sequence represents a 256-bit word whose most significant 8 bits are the ones from the first byte,i.e. the bytes are interpreted in big endian form.A hex string is first viewed as a byte sequence, by viewingeach pair of contiguous hex digits as a byte.The byte sequence must not exceed 32 bytes (i.e. 64 hex digits), and is treated as above.
When compiling for the EVM, this will be translated into anappropriatePUSHi
instruction. In the following example,3
and2
are added resulting in 5 and then thebitwiseand
with the string “abc” is computed.The final value is assigned to a local variable calledx
.
The 32-byte limit above does not apply to string literals passed to builtin functions that requireliteral arguments (e.g.setimmutable
orloadimmutable
). Those strings never end up in thegenerated bytecode.
letx:=and("abc",add(3,2))
Unless it is the default type, the type of a literalhas to be specified after a colon:
// This will not compile (u32 and u256 type not implemented yet)letx:=and("abc":u32,add(3:u256,2:u256))
Function Calls
Both built-in and user-defined functions (see below) can be calledin the same way as shown in the previous example.If the function returns a single value, it can be directly usedinside an expression again. If it returns multiple values,they have to be assigned to local variables.
functionf(x,y)->a,b{/* ... */}mstore(0x80,add(mload(0x80),3))// Here, the user-defined function `f` returns two values.letx,y:=f(1,mload(0))
For built-in functions of the EVM, functional expressionscan be directly translated to a stream of opcodes:You just read the expression from right to left to obtain theopcodes. In the case of the first line in the example, thisisPUSH13PUSH10x80MLOADADDPUSH10x80MSTORE
.
For calls to user-defined functions, the arguments are alsoput on the stack from right to left and this is the orderin which argument lists are evaluated. The return values,though, are expected on the stack from left to right,i.e. in this example,y
is on top of the stack andx
is below it.
Variable Declarations
You can use thelet
keyword to declare variables.A variable is only visible inside the{...}
-block it was defined in. When compiling to the EVM,a new stack slot is created that is reservedfor the variable and automatically removed again when the end of the blockis reached. You can provide an initial value for the variable.If you do not provide a value, the variable will be initialized to zero.
Since variables are stored on the stack, they do not directlyinfluence memory or storage, but they can be used as pointersto memory or storage locations in the built-in functionsmstore
,mload
,sstore
andsload
.Future dialects might introduce specific types for such pointers.
When a variable is referenced, its current value is copied.For the EVM, this translates to aDUP
instruction.
{letzero:=0letv:=calldataload(zero){lety:=add(sload(v),1)v:=y}// y is "deallocated" heresstore(v,zero)}// v and zero are "deallocated" here
If the declared variable should have a type different from the default type,you denote that following a colon. You can also declare multiplevariables in one statement when you assign from a function callthat returns multiple values.
// This will not compile (u32 and u256 type not implemented yet){letzero:u32:=0:u32letv:u256,t:u32:=f()letx,y:=g()}
Depending on the optimiser settings, the compiler can free the stack slotsalready after the variable has been used forthe last time, even though it is still in scope.
Assignments
Variables can be assigned to after their definition using the:=
operator. It is possible to assign multiplevariables at the same time. For this, the number and types of thevalues have to match.If you want to assign the values returned from a function that hasmultiple return parameters, you have to provide multiple variables.The same variable may not occur multiple times on the left-hand side ofan assignment, e.g.x,x:=f()
is invalid.
letv:=0// re-assign vv:=2lett:=add(v,2)functionf()->a,b{}// assign multiple valuesv,t:=f()
If
The if statement can be used for conditionally executing code.No “else” block can be defined. Consider using “switch” instead (see below) ifyou need multiple alternatives.
iflt(calldatasize(),4){revert(0,0)}
The curly braces for the body are required.
Switch
You can use a switch statement as an extended version of the if statement.It takes the value of an expression and compares it to several literal constants.The branch corresponding to the matching constant is taken.Contrary to other programming languages, for safety reasons, control flow doesnot continue from one case to the next. There can be a fallback or defaultcase calleddefault
which is taken if none of the literal constants matches.
{letx:=0switchcalldataload(4)case0{x:=calldataload(0x24)}default{x:=calldataload(0x44)}sstore(0,div(x,2))}
The list of cases is not enclosed by curly braces, but the body of acase does require them.
Loops
Yul supports for-loops which consist ofa header containing an initializing part, a condition, a post-iterationpart and a body. The condition has to be an expression, whilethe other three are blocks. If the initializing partdeclares any variables at the top level, the scope of these variables extends to all otherparts of the loop.
Thebreak
andcontinue
statements can be used in the body to exit the loopor skip to the post-part, respectively.
The following example computes the sum of an area in memory.
{letx:=0for{leti:=0}lt(i,0x100){i:=add(i,0x20)}{x:=add(x,mload(i))}}
For loops can also be used as a replacement for while loops:Simply leave the initialization and post-iteration parts empty.
{letx:=0leti:=0for{}lt(i,0x100){}{// while(i < 0x100)x:=add(x,mload(i))i:=add(i,0x20)}}
Function Declarations
Yul allows the definition of functions. These should not be confused with functionsin Solidity since they are never part of an external interface of a contract andare part of a namespace separate from the one for Solidity functions.
For the EVM, Yul functions take theirarguments (and a return PC) from the stack and also put the results onto thestack. User-defined functions and built-in functions are called in exactly the same way.
Functions can be defined anywhere and are visible in the block they aredeclared in. Inside a function, you cannot access local variablesdefined outside of that function.
Functions declare parameters and return variables, similar to Solidity.To return a value, you assign it to the return variable(s).
If you call a function that returns multiple values, you have to assignthem to multiple variables usinga,b:=f(x)
orleta,b:=f(x)
.
Theleave
statement can be used to exit the current function. Itworks like thereturn
statement in other languages just that it doesnot take a value to return, it just exits the functions and the functionwill return whatever values are currently assigned to the return variable(s).
Note that the EVM dialect has a built-in function calledreturn
thatquits the full execution context (internal message call) and not justthe current yul function.
The following example implements the power function by square-and-multiply.
{functionpower(base,exponent)->result{switchexponentcase0{result:=1}case1{result:=base}default{result:=power(mul(base,base),div(exponent,2))switchmod(exponent,2)case1{result:=mul(base,result)}}}}
Specification of Yul
This chapter describes Yul code formally. Yul code is usually placed inside Yul objects,which are explained in their own chapter.
Block = '{' Statement* '}'Statement = Block | FunctionDefinition | VariableDeclaration | Assignment | If | Expression | Switch | ForLoop | BreakContinue | LeaveFunctionDefinition = 'function' Identifier '(' TypedIdentifierList? ')' ( '->' TypedIdentifierList )? BlockVariableDeclaration = 'let' TypedIdentifierList ( ':=' Expression )?Assignment = IdentifierList ':=' ExpressionExpression = FunctionCall | Identifier | LiteralIf = 'if' Expression BlockSwitch = 'switch' Expression ( Case+ Default? | Default )Case = 'case' Literal BlockDefault = 'default' BlockForLoop = 'for' Block Expression Block BlockBreakContinue = 'break' | 'continue'Leave = 'leave'FunctionCall = Identifier '(' ( Expression ( ',' Expression )* )? ')'Identifier = [a-zA-Z_$] [a-zA-Z_$0-9.]*IdentifierList = Identifier ( ',' Identifier)*TypeName = IdentifierTypedIdentifierList = Identifier ( ':' TypeName )? ( ',' Identifier ( ':' TypeName )? )*Literal = (NumberLiteral | StringLiteral | TrueLiteral | FalseLiteral) ( ':' TypeName )?NumberLiteral = HexNumber | DecimalNumberStringLiteral = '"' ([^"\r\n\\] | '\\' .)* '"'TrueLiteral = 'true'FalseLiteral = 'false'HexNumber = '0x' [0-9a-fA-F]+DecimalNumber = [0-9]+
Restrictions on the Grammar
Apart from those directly imposed by the grammar, the followingrestrictions apply:
Switches must have at least one case (including the default case).All case values need to have the same type and distinct values.If all possible values of the expression type are covered, a default case isnot allowed (i.e. a switch with abool
expression that has both atrue and a false case do not allow a default case).
Every expression evaluates to zero or more values. Identifiers and Literalsevaluate to exactlyone value and function calls evaluate to a number of values equal to thenumber of return variables of the function called.
In variable declarations and assignments, the right-hand-side expression(if present) has to evaluate to a number of values equal to the number ofvariables on the left-hand-side.This is the only situation where an expression evaluatingto more than one value is allowed.The same variable name cannot occur more than once in the left-hand-side ofan assignment or variable declaration.
Expressions that are also statements (i.e. at the block level) have toevaluate to zero values.
In all other situations, expressions have to evaluate to exactly one value.
Thecontinue
andbreak
statements can only be used inside loop bodiesand have to be in the same function as the loop (or both have to be at thetop level). Thecontinue
andbreak
statements cannot be usedin other parts of a loop, not even when it is scoped inside a second loop’s body.
The condition part of the for-loop has to evaluate to exactly one value.
Theleave
statement can only be used inside a function.
Functions cannot be defined anywhere inside for loop init blocks.
Literals cannot be larger than their type. The largest type defined is 256-bit wide.
During assignments and function calls, the types of the respective values have to match.There is no implicit type conversion. Type conversion in general can only be achievedif the dialect provides an appropriate built-in function that takes a value of onetype and returns a value of a different type.
Scoping Rules
Scopes in Yul are tied to Blocks (exceptions are functions and the for loopas explained below) and all declarations(FunctionDefinition
,VariableDeclaration
)introduce new identifiers into these scopes.
Identifiers are visible inthe block they are defined in (including all sub-nodes and sub-blocks):Functions are visible in the whole block (even before their definitions) whilevariables are only visible starting from the statement after theVariableDeclaration
.
In particular,variables cannot be referenced in the right hand side of their own variabledeclaration.Functions can be referenced already before their declaration (if they are visible).
As an exception to the general scoping rule, the scope of the “init” part of the for-loop(the first block) extends across all other parts of the for loop.This means that variables (and functions) declared in the init part (but not inside ablock inside the init part) are visible in all other parts of the for-loop.
Identifiers declared in the other parts of the for loop respect the regularsyntactical scoping rules.
This means a for-loop of the formfor{I...}C{P...}{B...}
is equivalentto{I...for{}C{P...}{B...}}
.
The parameters and return parameters of functions are visible in thefunction body and their names have to be distinct.
Inside functions, it is not possible to reference a variable that was declaredoutside of that function.
Shadowing is disallowed, i.e. you cannot declare an identifier at a pointwhere another identifier with the same name is also visible, even if it isnot possible to reference it because it was declared outside the current function.
Formal Specification
We formally specify Yul by providing an evaluation function E overloadedon the various nodes of the AST. As builtin functions can have side effects,E takes two state objects and the AST node and returns two newstate objects and a variable number of other values.The two state objects are the global state object(which in the context of the EVM is the memory, storage and state of theblockchain) and the local state object (the state of local variables, i.e. asegment of the stack in the EVM).
If the AST node is a statement, E returns the two state objects and a “mode”,which is used for thebreak
,continue
andleave
statements.If the AST node is an expression, E returns the two state objects andas many values as the expression evaluates to.
The exact nature of the global state is unspecified for this high leveldescription. The local stateL
is a mapping of identifiersi
to valuesv
,denoted asL[i]=v
.
For an identifierv
, let$v
be the name of the identifier.
We will use a destructuring notation for the AST nodes.
E(G, L, <{St1, ..., Stn}>: Block) = let G1, L1, mode = E(G, L, St1, ..., Stn) let L2 be a restriction of L1 to the identifiers of L G1, L2, modeE(G, L, St1, ..., Stn: Statement) = if n is zero: G, L, regular else: let G1, L1, mode = E(G, L, St1) if mode is regular then E(G1, L1, St2, ..., Stn) otherwise G1, L1, modeE(G, L, FunctionDefinition) = G, L, regularE(G, L, <let var_1, ..., var_n := rhs>: VariableDeclaration) = E(G, L, <var_1, ..., var_n := rhs>: Assignment)E(G, L, <let var_1, ..., var_n>: VariableDeclaration) = let L1 be a copy of L where L1[$var_i] = 0 for i = 1, ..., n G, L1, regularE(G, L, <var_1, ..., var_n := rhs>: Assignment) = let G1, L1, v1, ..., vn = E(G, L, rhs) let L2 be a copy of L1 where L2[$var_i] = vi for i = 1, ..., n G, L2, regularE(G, L, <for { i1, ..., in } condition post body>: ForLoop) = if n >= 1: let G1, L, mode = E(G, L, i1, ..., in) // mode has to be regular or leave due to the syntactic restrictions if mode is leave then G1, L1 restricted to variables of L, leave otherwise let G2, L2, mode = E(G1, L1, for {} condition post body) G2, L2 restricted to variables of L, mode else: let G1, L1, v = E(G, L, condition) if v is false: G1, L1, regular else: let G2, L2, mode = E(G1, L, body) if mode is break: G2, L2, regular otherwise if mode is leave: G2, L2, leave else: G3, L3, mode = E(G2, L2, post) if mode is leave: G2, L3, leave otherwise E(G3, L3, for {} condition post body)E(G, L, break: BreakContinue) = G, L, breakE(G, L, continue: BreakContinue) = G, L, continueE(G, L, leave: Leave) = G, L, leaveE(G, L, <if condition body>: If) = let G0, L0, v = E(G, L, condition) if v is true: E(G0, L0, body) else: G0, L0, regularE(G, L, <switch condition case l1:t1 st1 ... case ln:tn stn>: Switch) = E(G, L, switch condition case l1:t1 st1 ... case ln:tn stn default {})E(G, L, <switch condition case l1:t1 st1 ... case ln:tn stn default st'>: Switch) = let G0, L0, v = E(G, L, condition) // i = 1 .. n // Evaluate literals, context doesn't matter let _, _, v1 = E(G0, L0, l1) ... let _, _, vn = E(G0, L0, ln) if there exists smallest i such that vi = v: E(G0, L0, sti) else: E(G0, L0, st')E(G, L, <name>: Identifier) = G, L, L[$name]E(G, L, <fname(arg1, ..., argn)>: FunctionCall) = G1, L1, vn = E(G, L, argn) ... G(n-1), L(n-1), v2 = E(G(n-2), L(n-2), arg2) Gn, Ln, v1 = E(G(n-1), L(n-1), arg1) Let <function fname (param1, ..., paramn) -> ret1, ..., retm block> be the function of name $fname visible at the point of the call. Let L' be a new local state such that L'[$parami] = vi and L'[$reti] = 0 for all i. Let G'', L'', mode = E(Gn, L', block) G'', Ln, L''[$ret1], ..., L''[$retm]E(G, L, l: StringLiteral) = G, L, utf8EncodeLeftAligned(l), where utf8EncodeLeftAligned performs a UTF-8 encoding of l and aligns it left into 32 bytesE(G, L, n: HexNumber) = G, L, hex(n) where hex is the hexadecimal decoding functionE(G, L, n: DecimalNumber) = G, L, dec(n), where dec is the decimal decoding function
EVM Dialect
The default dialect of Yul currently is the EVM dialect for the currently selected version of the EVM.with a version of the EVM. The only type available in this dialectisu256
, the 256-bit native type of the Ethereum Virtual Machine.Since it is the default type of this dialect, it can be omitted.
The following table lists all builtin functions(depending on the EVM version) and provides a short description of thesemantics of the function / opcode.This document does not want to be a full description of the Ethereum virtual machine.Please refer to a different document if you are interested in the precise semantics.
Opcodes marked with-
do not return a result and all others return exactly one value.Opcodes marked withF
,H
,B
,C
,I
andL
are present since Frontier, Homestead,Byzantium, Constantinople, Istanbul or London respectively.
In the following,mem[a...b)
signifies the bytes of memory starting at positiona
up tobut not including positionb
andstorage[p]
signifies the storage contents at slotp
.
Since Yul manages local variables and control-flow,opcodes that interfere with these features are not available. This includesthedup
andswap
instructions as well asjump
instructions, labels and thepush
instructions.
Instruction | Explanation | ||
---|---|---|---|
stop() | - | F | stop execution, identical to return(0, 0) |
add(x, y) | F | x + y | |
sub(x, y) | F | x - y | |
mul(x, y) | F | x * y | |
div(x, y) | F | x / y or 0 if y == 0 | |
sdiv(x, y) | F | x / y, for signed numbers in two’s complement, 0 if y == 0 | |
mod(x, y) | F | x % y, 0 if y == 0 | |
smod(x, y) | F | x % y, for signed numbers in two’s complement, 0 if y == 0 | |
exp(x, y) | F | x to the power of y | |
not(x) | F | bitwise “not” of x (every bit of x is negated) | |
lt(x, y) | F | 1 if x < y, 0 otherwise | |
gt(x, y) | F | 1 if x > y, 0 otherwise | |
slt(x, y) | F | 1 if x < y, 0 otherwise, for signed numbers in two’s complement | |
sgt(x, y) | F | 1 if x > y, 0 otherwise, for signed numbers in two’s complement | |
eq(x, y) | F | 1 if x == y, 0 otherwise | |
iszero(x) | F | 1 if x == 0, 0 otherwise | |
and(x, y) | F | bitwise “and” of x and y | |
or(x, y) | F | bitwise “or” of x and y | |
xor(x, y) | F | bitwise “xor” of x and y | |
byte(n, x) | F | nth byte of x, where the most significant byte is the 0th byte | |
shl(x, y) | C | logical shift left y by x bits | |
shr(x, y) | C | logical shift right y by x bits | |
sar(x, y) | C | signed arithmetic shift right y by x bits | |
addmod(x, y, m) | F | (x + y) % m with arbitrary precision arithmetic, 0 if m == 0 | |
mulmod(x, y, m) | F | (x * y) % m with arbitrary precision arithmetic, 0 if m == 0 | |
signextend(i, x) | F | sign extend from (i*8+7)th bit counting from least significant | |
keccak256(p, n) | F | keccak(mem[p…(p+n))) | |
pc() | F | current position in code | |
pop(x) | - | F | discard value x |
mload(p) | F | mem[p…(p+32)) | |
mstore(p, v) | - | F | mem[p…(p+32)) := v |
mstore8(p, v) | - | F | mem[p] := v & 0xff (only modifies a single byte) |
sload(p) | F | storage[p] | |
sstore(p, v) | - | F | storage[p] := v |
msize() | F | size of memory, i.e. largest accessed memory index | |
gas() | F | gas still available to execution | |
address() | F | address of the current contract / execution context | |
balance(a) | F | wei balance at address a | |
selfbalance() | I | equivalent to balance(address()), but cheaper | |
caller() | F | call sender (excluding | |
callvalue() | F | wei sent together with the current call | |
calldataload(p) | F | call data starting from position p (32 bytes) | |
calldatasize() | F | size of call data in bytes | |
calldatacopy(t, f, s) | - | F | copy s bytes from calldata at position f to mem at position t |
codesize() | F | size of the code of the current contract / execution context | |
codecopy(t, f, s) | - | F | copy s bytes from code at position f to mem at position t |
extcodesize(a) | F | size of the code at address a | |
extcodecopy(a, t, f, s) | - | F | like codecopy(t, f, s) but take code at address a |
returndatasize() | B | size of the last returndata | |
returndatacopy(t, f, s) | - | B | copy s bytes from returndata at position f to mem at position t |
extcodehash(a) | C | code hash of address a | |
create(v, p, n) | F | create new contract with code mem[p…(p+n)) and send v weiand return the new address; returns 0 on error | |
create2(v, p, n, s) | C | create new contract with code mem[p…(p+n)) at addresskeccak256(0xff . this . s . keccak256(mem[p…(p+n)))and send v wei and return the new address, where | |
call(g, a, v, in,insize, out, outsize) | F | call contract at address a with input mem[in…(in+insize))providing g gas and v wei and output areamem[out…(out+outsize)) returning 0 on error (eg. out of gas)and 1 on successSee more | |
callcode(g, a, v, in,insize, out, outsize) | F | identical to | |
delegatecall(g, a, in,insize, out, outsize) | H | identical to | |
staticcall(g, a, in,insize, out, outsize) | B | identical to | |
return(p, s) | - | F | end execution, return data mem[p…(p+s)) |
revert(p, s) | - | B | end execution, revert state changes, return data mem[p…(p+s)) |
selfdestruct(a) | - | F | end execution, destroy current contract and send funds to a |
invalid() | - | F | end execution with invalid instruction |
log0(p, s) | - | F | log without topics and data mem[p…(p+s)) |
log1(p, s, t1) | - | F | log with topic t1 and data mem[p…(p+s)) |
log2(p, s, t1, t2) | - | F | log with topics t1, t2 and data mem[p…(p+s)) |
log3(p, s, t1, t2, t3) | - | F | log with topics t1, t2, t3 and data mem[p…(p+s)) |
log4(p, s, t1, t2, t3,t4) | - | F | log with topics t1, t2, t3, t4 and data mem[p…(p+s)) |
chainid() | I | ID of the executing chain (EIP-1344) | |
basefee() | L | current block’s base fee (EIP-3198 and EIP-1559) | |
origin() | F | transaction sender | |
gasprice() | F | gas price of the transaction | |
blockhash(b) | F | hash of block nr b - only for last 256 blocks excluding current | |
coinbase() | F | current mining beneficiary | |
timestamp() | F | timestamp of the current block in seconds since the epoch | |
number() | F | current block number | |
difficulty() | F | difficulty of the current block | |
gaslimit() | F | block gas limit of the current block |
Note
Thecall*
instructions use theout
andoutsize
parameters to define an area in memory wherethe return or failure data is placed. This area is written to depending on how many bytes the called contract returns.If it returns more data, only the firstoutsize
bytes are written. You can access the rest of the datausing thereturndatacopy
opcode. If it returns less data, then the remaining bytes are not touched at all.You need to use thereturndatasize
opcode to check which part of this memory area contains the return data.The remaining bytes will retain their values as of before the call.
In some internal dialects, there are additional functions:
datasize, dataoffset, datacopy
The functionsdatasize(x)
,dataoffset(x)
anddatacopy(t,f,l)
are used to access other parts of a Yul object.
datasize
anddataoffset
can only take string literals (the names of other objects)as arguments and return the size and offset in the data area, respectively.For the EVM, thedatacopy
function is equivalent tocodecopy
.
setimmutable, loadimmutable
The functionssetimmutable(offset,"name",value)
andloadimmutable("name")
areused for the immutable mechanism in Solidity and do not nicely map to pure Yul.The call tosetimmutable(offset,"name",value)
assumes that the runtime code of the contractcontaining the given named immutable was copied to memory at offsetoffset
and will writevalue
to allpositions in memory (relative tooffset
) that contain the placeholder that was generated for callstoloadimmutable("name")
in the runtime code.
linkersymbol
The functionlinkersymbol("fq_library_name")
is a placeholder for an address literal to besubstituted by the linker. Its first and only argument must be a string literal and represents thefully qualified library name used with the--libraries
option.
For example this code
leta:=linkersymbol("file.sol:Math")
is equivalent to
leta:=0x1234567890123456789012345678901234567890
when the linker is invoked with--libraries"file.sol:Math=0x1234567890123456789012345678901234567890
option.
SeeUsing the Commandline Compiler for details about the Solidity linker.
memoryguard
This function is available in the EVM dialect with objects. The caller ofletptr:=memoryguard(size)
(wheresize
has to be a literal number)promises that they only use memory in either the range[0,size)
or theunbounded range starting atptr
.
Since the presence of amemoryguard
call indicates that all memory accessadheres to this restriction, it allows the optimizer to perform additionaloptimization steps, for example the stack limit evader, which attempts to movestack variables that would otherwise be unreachable to memory.
The Yul optimizer promises to only use the memory range[size,ptr)
for its purposes.If the optimizer does not need to reserve any memory, it holds thatptr==size
.
memoryguard
can be called multiple times, but needs to have the same literal as argumentwithin one Yul subobject. If at least onememoryguard
call is found in a subobject,the additional optimiser steps will be run on it.
verbatim
The set ofverbatim...
builtin functions lets you create bytecode for opcodesthat are not known to the Yul compiler. It also allows you to createbytecode sequences that will not be modified by the optimizer.
The functions areverbatim_<n>i_<m>o("<data>",...)
, where
n
is a decimal between 0 and 99 that specifies the number of input stack slots / variablesm
is a decimal between 0 and 99 that specifies the number of output stack slots / variablesdata
is a string literal that contains the sequence of bytes
If you for example want to define a function that multiplies the inputby two, without the optimizer touching the constant two, you can use
letx:=calldataload(0)letdouble:=verbatim_1i_1o(hex"600202",x)
This code will result in adup1
opcode to retrievex
(the optimizer might directly re-use result of thecalldataload
opcode, though)directly followed by600202
. The code is assumed toconsume the copied value ofx
and produce the resulton the top of the stack. The compiler then generates codeto allocate a stack slot fordouble
and store the result there.
As with all opcodes, the arguments are arranged on the stackwith the leftmost argument on the top, while the return valuesare assumed to be laid out such that the rightmost variable isat the top of the stack.
Sinceverbatim
can be used to generate arbitrary opcodesor even opcodes unknown to the Solidity compiler, care has to be takenwhen usingverbatim
together with the optimizer. Even when theoptimizer is switched off, the code generator has to determinethe stack layout, which means that e.g. usingverbatim
to modifythe stack height can lead to undefined behaviour.
The following is a non-exhaustive list of restrictions onverbatim bytecode that are not checked bythe compiler. Violations of these restrictions can result inundefined behaviour.
Control-flow should not jump into or out of verbatim blocks,but it can jump within the same verbatim block.
Stack contents apart from the input and output parametersshould not be accessed.
The stack height difference should be exactly
m-n
(output slots minus input slots).Verbatim bytecode cannot make any assumptions about thesurrounding bytecode. All required parameters have to bepassed in as stack variables.
The optimizer does not analyze verbatim bytecode and alwaysassumes that it modifies all aspects of state and thus can onlydo very few optimizations acrossverbatim
function calls.
The optimizer treats verbatim bytecode as an opaque block of code.It will not split it but might move, duplicateor combine it with identical verbatim bytecode blocks.If a verbatim bytecode block is unreachable by the control-flow,it can be removed.
Warning
During discussions about whether or not EVM improvementsmight break existing smart contracts, features insideverbatim
cannot receive the same consideration as those used by the Soliditycompiler itself.
Note
To avoid confusion, all identifiers starting with the stringverbatim
are reservedand cannot be used for user-defined identifiers.
Specification of Yul Object
Yul objects are used to group named code and data sections.The functionsdatasize
,dataoffset
anddatacopy
can be used to access these sections from within code.Hex strings can be used to specify data in hex encoding,regular strings in native encoding. For code,datacopy
will access its assembled binary representation.
Object = 'object' StringLiteral '{' Code ( Object | Data )* '}'Code = 'code' BlockData = 'data' StringLiteral ( HexLiteral | StringLiteral )HexLiteral = 'hex' ('"' ([0-9a-fA-F]{2})* '"' | '\'' ([0-9a-fA-F]{2})* '\'')StringLiteral = '"' ([^"\r\n\\] | '\\' .)* '"'
Above,Block
refers toBlock
in the Yul code grammar explained in the previous chapter.
Note
Data objects or sub-objects whose names contain a.
can be definedbut it is not possible to access them throughdatasize
,dataoffset
ordatacopy
because.
is used as a separatorto access objects inside another object.
Note
The data object called".metadata"
has a special meaning:It cannot be accessed from code and is always appended to the very end of thebytecode, regardless of its position in the object.
Other data objects with special significance might be added in thefuture, but their names will always start with a.
.
An example Yul Object is shown below:
// A contract consists of a single object with sub-objects representing// the code to be deployed or other contracts it can create.// The single "code" node is the executable code of the object.// Every (other) named object or data section is serialized and// made accessible to the special built-in functions datacopy / dataoffset / datasize// The current object, sub-objects and data items inside the current object// are in scope.object"Contract1"{// This is the constructor code of the contract.code{functionallocate(size)->ptr{ptr:=mload(0x40)ifiszero(ptr){ptr:=0x60}mstore(0x40,add(ptr,size))}// first create "Contract2"letsize:=datasize("Contract2")letoffset:=allocate(size)// This will turn into codecopy for EVMdatacopy(offset,dataoffset("Contract2"),size)// constructor parameter is a single number 0x1234mstore(add(offset,size),0x1234)pop(create(offset,add(size,32),0))// now return the runtime object (the currently// executing code is the constructor code)size:=datasize("runtime")offset:=allocate(size)// This will turn into a memory->memory copy for Ewasm and// a codecopy for EVMdatacopy(offset,dataoffset("runtime"),size)return(offset,size)}data"Table2"hex"4123"object"runtime"{code{functionallocate(size)->ptr{ptr:=mload(0x40)ifiszero(ptr){ptr:=0x60}mstore(0x40,add(ptr,size))}// runtime codemstore(0,"Hello, World!")return(0,0x20)}}// Embedded object. Use case is that the outside is a factory contract,// and Contract2 is the code to be created by the factoryobject"Contract2"{code{// code here ...}object"runtime"{code{// code here ...}}data"Table1"hex"4123"}}
Yul Optimizer
The Yul optimizer operates on Yul code and uses the same language for input, output andintermediate states. This allows for easy debugging and verification of the optimizer.
Please refer to the generaloptimizer documentationfor more details about the different optimization stages and how to use the optimizer.
If you want to use Solidity in stand-alone Yul mode, you activate the optimizer using--optimize
and optionally specify theexpected number of contract executions with--optimize-runs
:
solc --strict-assembly --optimize --optimize-runs200
In Solidity mode, the Yul optimizer is activated together with the regular optimizer.
Optimization Step Sequence
By default the Yul optimizer applies its predefined sequence of optimization steps to the generated assembly.You can override this sequence and supply your own using the--yul-optimizations
option:
solc --optimize --ir-optimized --yul-optimizations'dhfoD[xarrscLMcCTU]uljmul'
The order of steps is significant and affects the quality of the output.Moreover, applying a step may uncover new optimization opportunities for others that were alreadyapplied so repeating steps is often beneficial.By enclosing part of the sequence in square brackets ([]
) you tell the optimizer to repeatedlyapply that part until it no longer improves the size of the resulting assembly.You can use brackets multiple times in a single sequence but they cannot be nested.
The following optimization steps are available:
Abbreviation | Full name |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Some steps depend on properties ensured byBlockFlattener
,FunctionGrouper
,ForLoopInitRewriter
.For this reason the Yul optimizer always applies them before applying any steps supplied by the user.
The ReasoningBasedSimplifier is an optimizer step that is currently not enabledin the default set of steps. It uses an SMT solver to simplify arithmetic expressionsand boolean conditions. It has not received thorough testing or validation yet and can producenon-reproducible results, so please use with care!
Complete ERC20 Example
object"Token"{code{// Store the creator in slot zero.sstore(0,caller())// Deploy the contractdatacopy(0,dataoffset("runtime"),datasize("runtime"))return(0,datasize("runtime"))}object"runtime"{code{// Protection against sending Etherrequire(iszero(callvalue()))// Dispatcherswitchselector()case0x70a08231/* "balanceOf(address)" */{returnUint(balanceOf(decodeAsAddress(0)))}case0x18160ddd/* "totalSupply()" */{returnUint(totalSupply())}case0xa9059cbb/* "transfer(address,uint256)" */{transfer(decodeAsAddress(0),decodeAsUint(1))returnTrue()}case0x23b872dd/* "transferFrom(address,address,uint256)" */{transferFrom(decodeAsAddress(0),decodeAsAddress(1),decodeAsUint(2))returnTrue()}case0x095ea7b3/* "approve(address,uint256)" */{approve(decodeAsAddress(0),decodeAsUint(1))returnTrue()}case0xdd62ed3e/* "allowance(address,address)" */{returnUint(allowance(decodeAsAddress(0),decodeAsAddress(1)))}case0x40c10f19/* "mint(address,uint256)" */{mint(decodeAsAddress(0),decodeAsUint(1))returnTrue()}default{revert(0,0)}functionmint(account,amount){require(calledByOwner())mintTokens(amount)addToBalance(account,amount)emitTransfer(0,account,amount)}functiontransfer(to,amount){executeTransfer(caller(),to,amount)}functionapprove(spender,amount){revertIfZeroAddress(spender)setAllowance(caller(),spender,amount)emitApproval(caller(),spender,amount)}functiontransferFrom(from,to,amount){decreaseAllowanceBy(from,caller(),amount)executeTransfer(from,to,amount)}functionexecuteTransfer(from,to,amount){revertIfZeroAddress(to)deductFromBalance(from,amount)addToBalance(to,amount)emitTransfer(from,to,amount)}/* ---------- calldata decoding functions ----------- */functionselector()->s{s:=div(calldataload(0),0x100000000000000000000000000000000000000000000000000000000)}functiondecodeAsAddress(offset)->v{v:=decodeAsUint(offset)ifiszero(iszero(and(v,not(0xffffffffffffffffffffffffffffffffffffffff)))){revert(0,0)}}functiondecodeAsUint(offset)->v{letpos:=add(4,mul(offset,0x20))iflt(calldatasize(),add(pos,0x20)){revert(0,0)}v:=calldataload(pos)}/* ---------- calldata encoding functions ---------- */functionreturnUint(v){mstore(0,v)return(0,0x20)}functionreturnTrue(){returnUint(1)}/* -------- events ---------- */functionemitTransfer(from,to,amount){letsignatureHash:=0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3efemitEvent(signatureHash,from,to,amount)}functionemitApproval(from,spender,amount){letsignatureHash:=0x8c5be1e5ebec7d5bd14f71427d1e84f3dd0314c0f7b2291e5b200ac8c7c3b925emitEvent(signatureHash,from,spender,amount)}functionemitEvent(signatureHash,indexed1,indexed2,nonIndexed){mstore(0,nonIndexed)log3(0,0x20,signatureHash,indexed1,indexed2)}/* -------- storage layout ---------- */functionownerPos()->p{p:=0}functiontotalSupplyPos()->p{p:=1}functionaccountToStorageOffset(account)->offset{offset:=add(0x1000,account)}functionallowanceStorageOffset(account,spender)->offset{offset:=accountToStorageOffset(account)mstore(0,offset)mstore(0x20,spender)offset:=keccak256(0,0x40)}/* -------- storage access ---------- */functionowner()->o{o:=sload(ownerPos())}functiontotalSupply()->supply{supply:=sload(totalSupplyPos())}functionmintTokens(amount){sstore(totalSupplyPos(),safeAdd(totalSupply(),amount))}functionbalanceOf(account)->bal{bal:=sload(accountToStorageOffset(account))}functionaddToBalance(account,amount){letoffset:=accountToStorageOffset(account)sstore(offset,safeAdd(sload(offset),amount))}functiondeductFromBalance(account,amount){letoffset:=accountToStorageOffset(account)letbal:=sload(offset)require(lte(amount,bal))sstore(offset,sub(bal,amount))}functionallowance(account,spender)->amount{amount:=sload(allowanceStorageOffset(account,spender))}functionsetAllowance(account,spender,amount){sstore(allowanceStorageOffset(account,spender),amount)}functiondecreaseAllowanceBy(account,spender,amount){letoffset:=allowanceStorageOffset(account,spender)letcurrentAllowance:=sload(offset)require(lte(amount,currentAllowance))sstore(offset,sub(currentAllowance,amount))}/* ---------- utility functions ---------- */functionlte(a,b)->r{r:=iszero(gt(a,b))}functiongte(a,b)->r{r:=iszero(lt(a,b))}functionsafeAdd(a,b)->r{r:=add(a,b)ifor(lt(r,a),lt(r,b)){revert(0,0)}}functioncalledByOwner()->cbo{cbo:=eq(owner(),caller())}functionrevertIfZeroAddress(addr){require(addr)}functionrequire(condition){ifiszero(condition){revert(0,0)}}}}}