Overall, we make the flat representation of longs as a pair of(lo, hi)
the representation, at the Emitter level. There are no more instances ofRuntimeLong
. The emitter flattens out all theLong
s as follows:
- A local variable of type
long
becomes two local variables of typeint
. - A field of type
long
becomes two fields of typeint
. - An
Array[Long]
is stored as anInt32Array
with twice as many elements, alternatinglo
andhi
words. - Method parameters of type
long
are expanded astwo parameters of typeint
.
For theresult of method parameters, there is a trick. That one is the most "debatable", in that I think there are contending alternatives that may be faster. When a method returns aLong
, it stores thehi
word in a global variable$resHi
, then returns thelo
word. At call site, we read back the$resHi
global. We used a similar trick for non-inlined methods ofRuntimeLong
, with avar resultHi
field ofobject RuntimeLong
. Now this is moved to the emitter to take care of.
All the methods ofRuntimeLong
explicitly take "expanded" versions of their parameters:abs
takes two parameters of typeInt
;add
takes 4. Shifts take 3 parameters of typeInt
: thelo
, thehi
, and the shift amount. Theresult, however, is aLong
.
In order to allow them toconstructLong
results from their words, we introduce a magic methodRuntimeLong.pack(lo, hi)
, whose body is filled in by theDesugarer
with a specialTransient(PackLong(lo, hi))
. It cannot be the compiler, because we cannot serialize transients. And it cannot wait for the emitter, because the optimizer definitely wants to see thePackLong
s to unpack them. An alternative would be to introduce a newBinaryOp
, but I think that's worse because it bakes an implementation detail of the emitter into the IR. The fact that wecan even do this PR is a testament to the current abstraction level of our IR.
These changes significantly speed up even the SHA512 benchmark, even though it performs most computations on stack already anyway (the improvements must come from the arrays, in this case). I haven't measured benchmarks that extensively useLong
fields yet, but I expect them to get dramatic speedups.
For the optimizer, this makes things a lot simpler. Instead of having special-cases forRuntimeLong
everywhere, we basically have a uniquewithSplitLong
method to deal with them. That method can split onePreTransform
of typelong
into twoPreTransform
s of typeint
s, so that they can be given to the inlined methods ofRuntimeLong
. We introduce a newLongPairReplacement
forLocalDef
s that aggregate a pair of(lo, hi)
LocalDef
s (typically the result of a splitPackLong
). As a nice bonus, the IR checker now passes withRuntimeLong
after the optimizer.
One big caveat for now: Closure breaks the new encoding, sometimes. I think it gets confused by the$resHi
variable and evaluation order of function params. For example if we pass the result of aLong
method to aLong
parameter, we emit
Duringy.bar(1)
, the method modifies$resHi
. It is then read right after to be passed as second argument tox.foo
.
Uh oh!
There was an error while loading.Please reload this page.
Overall, we make the flat representation of longs as a pair of
(lo, hi)
the representation, at the Emitter level. There are no more instances ofRuntimeLong
. The emitter flattens out all theLong
s as follows:long
becomes two local variables of typeint
.long
becomes two fields of typeint
.Array[Long]
is stored as anInt32Array
with twice as many elements, alternatinglo
andhi
words.long
are expanded astwo parameters of typeint
.For theresult of method parameters, there is a trick. That one is the most "debatable", in that I think there are contending alternatives that may be faster. When a method returns a
Long
, it stores thehi
word in a global variable$resHi
, then returns thelo
word. At call site, we read back the$resHi
global. We used a similar trick for non-inlined methods ofRuntimeLong
, with avar resultHi
field ofobject RuntimeLong
. Now this is moved to the emitter to take care of.All the methods of
RuntimeLong
explicitly take "expanded" versions of their parameters:abs
takes two parameters of typeInt
;add
takes 4. Shifts take 3 parameters of typeInt
: thelo
, thehi
, and the shift amount. Theresult, however, is aLong
.In order to allow them toconstruct
Long
results from their words, we introduce a magic methodRuntimeLong.pack(lo, hi)
, whose body is filled in by theDesugarer
with a specialTransient(PackLong(lo, hi))
. It cannot be the compiler, because we cannot serialize transients. And it cannot wait for the emitter, because the optimizer definitely wants to see thePackLong
s to unpack them. An alternative would be to introduce a newBinaryOp
, but I think that's worse because it bakes an implementation detail of the emitter into the IR. The fact that wecan even do this PR is a testament to the current abstraction level of our IR.These changes significantly speed up even the SHA512 benchmark, even though it performs most computations on stack already anyway (the improvements must come from the arrays, in this case). I haven't measured benchmarks that extensively use
Long
fields yet, but I expect them to get dramatic speedups.For the optimizer, this makes things a lot simpler. Instead of having special-cases for
RuntimeLong
everywhere, we basically have a uniquewithSplitLong
method to deal with them. That method can split onePreTransform
of typelong
into twoPreTransform
s of typeint
s, so that they can be given to the inlined methods ofRuntimeLong
. We introduce a newLongPairReplacement
forLocalDef
s that aggregate a pair of(lo, hi)
LocalDef
s (typically the result of a splitPackLong
). As a nice bonus, the IR checker now passes withRuntimeLong
after the optimizer.One big caveat for now: Closure breaks the new encoding, sometimes. I think it gets confused by the
$resHi
variable and evaluation order of function params. For example if we pass the result of aLong
method to aLong
parameter, we emitDuring
y.bar(1)
, the method modifies$resHi
. It is then read right after to be passed as second argument tox.foo
.