Incomputer science, theSethi–Ullman algorithm is analgorithm named afterRavi Sethi andJeffrey D. Ullman, its inventors, for translatingabstract syntax trees intomachine code that uses as fewregisters as possible.
Whengenerating code for arithmetic expressions, thecompiler has to decide which is the best way to translate the expression in terms of number of instructions used as well as number of registers needed to evaluate a certain subtree. Especially in the case that free registers are scarce, theorder of evaluation can be important to the length of the generated code, because different orderings may lead to larger or smaller numbers of intermediate values beingspilled to memory and then restored. The Sethi–Ullman algorithm (also known asSethi–Ullman numbering) produces code which needs the fewest instructions possible as well as the fewest storage references (under the assumption that at the mostcommutativity andassociativity apply to the operators used, but distributive laws i.e. do not hold). The algorithm succeeds as well if neithercommutativity norassociativity hold for the expressions used, and therefore arithmetic transformations can not be applied. The algorithm also does not take advantage of common subexpressions or apply directly to expressions represented as general directed acyclic graphs rather than trees.
Thesimple Sethi–Ullman algorithm works as follows (for aload/store architecture):
For an arithmetic expression, theabstract syntax tree looks like this:
= / \ a * / \ / \ + + / \ / \ / \ d 3 + * / \ / \ b c f g
To continue with the algorithm, we need only to examine the arithmetic expression, i.e. we only have to look at the right subtree of the assignment '=':
* / \ / \ + + / \ / \ / \ d 3 + * / \ / \ b c f g
Now we start traversing the tree (in preorder for now), assigning the number of registers needed to evaluate each subtree (note that the last summand in the expression is a constant):
*2 / \ / \ +2 +1 / \ / \ / \ d1 30 +1 *1 / \ / \ b1 c0f1 g0
From this tree it can be seen that we need 2 registers to compute the left subtree of the '*', but only 1 register to compute the right subtree. Nodes 'c' and 'g' do not need registers for the following reasons: If T is a tree leaf, then the number of registers to evaluate T is either 1 or 0 depending whether T is a left or a right subtree (since an operation such as add R1, A can handle the right component A directly without storing it into a register). Therefore we shall start to emit code for the left subtree first, because we might run into the situation that we only have 2 registers left to compute the whole expression. If we now computed the right subtree first (which needs only 1 register), we would then need a register to hold the result of the right subtree while computing the left subtree (which would still need 2 registers), therefore needing 3 registers concurrently. Computing the left subtree first needs 2 registers, but the result can be stored in 1, and since the right subtree needs only 1 register to compute, the evaluation of the expression can do with only 2 registers left.
In an advanced version of theSethi–Ullman algorithm, the arithmetic expressions are first transformed, exploiting the algebraic properties of the operators used.