Overall, we make the flat representation of longs as a pair of(lo, hi)the representation, at the Emitter level. There are no more instances ofRuntimeLong. The emitter flattens out all theLongs as follows:

A local variable of typelong becomes two local variables of typeint.
A field of typelong becomes two fields of typeint.
AnArray[Long] is stored as anInt32Array with twice as many elements, alternatinglo andhi words.
Method parameters of typelong are expanded astwo parameters of typeint.

For theresult of method parameters, there is a trick. That one is the most "debatable", in that I think there are contending alternatives that may be faster. When a method returns aLong, it stores thehi word in a global variable$resHi, then returns thelo word. At call site, we read back the$resHi global. We used a similar trick for non-inlined methods ofRuntimeLong, with avar resultHi field ofobject RuntimeLong. Now this is moved to the emitter to take care of.

All the methods ofRuntimeLong explicitly take "expanded" versions of their parameters:abs takes two parameters of typeInt;add takes 4. Shifts take 3 parameters of typeInt: thelo, thehi, and the shift amount. Theresult, however, is aLong.

In order to allow them toconstructLong results from their words, we introduce a magic methodRuntimeLong.pack(lo, hi), whose body is filled in by theDesugarer with a specialTransient(PackLong(lo, hi)). It cannot be the compiler, because we cannot serialize transients. And it cannot wait for the emitter, because the optimizer definitely wants to see thePackLongs to unpack them. An alternative would be to introduce a newBinaryOp, but I think that's worse because it bakes an implementation detail of the emitter into the IR. The fact that wecan even do this PR is a testament to the current abstraction level of our IR.

These changes significantly speed up even the SHA512 benchmark, even though it performs most computations on stack already anyway (the improvements must come from the arrays, in this case). I haven't measured benchmarks that extensively useLong fields yet, but I expect them to get dramatic speedups.

For the optimizer, this makes things a lot simpler. Instead of having special-cases forRuntimeLong everywhere, we basically have a uniquewithSplitLong method to deal with them. That method can split onePreTransform of typelong into twoPreTransforms of typeints, so that they can be given to the inlined methods ofRuntimeLong. We introduce a newLongPairReplacement forLocalDefs that aggregate a pair of(lo, hi)LocalDefs (typically the result of a splitPackLong). As a nice bonus, the IR checker now passes withRuntimeLong after the optimizer.

One big caveat for now: Closure breaks the new encoding, sometimes. I think it gets confused by the$resHi variable and evaluation order of function params. For example if we pass the result of aLong method to aLong parameter, we emit

x.foo(y.bar(1),$resHi);

Duringy.bar(1), the method modifies$resHi. It is then read right after to be passed as second argument tox.foo.

sjrd force-pushed thert-long-expanded branch 6 times, most recently from7161430 to2fe28e7Compare

June 28, 2025 11:09

Copy link

MemberAuthor

sjrd commentedJun 28, 2025

@gzm0 This is not yet ready for a code review. However, it is a good time to get your opinion on the overall new compilation scheme and architectural changes. WDYT?

sjrd force-pushed thert-long-expanded branch 2 times, most recently fromf35c5dd to3e51123Compare

June 28, 2025 15:12

sjrd force-pushed thert-long-expanded branch from3e51123 tof85b1beCompare

July 5, 2025 20:43

Copy link

MemberAuthor

sjrd commentedJul 5, 2025

Some results from thebounce benchmark, where I have replaced the PRNG by aju.SplittableRandom. OurSplittableRandom was not specially crafted to optimize itsLong usages (unlikeju.Random), so it is probably characteristic of "everyday"Long manipulations.

Config	Before	After
default fastLink	38.5	37.0
optimized semantics	37.3	33.6
opt sems + minify	37.3	33.4

"Before" is#5204, and "After" is this PR.

It's not dramatic, but it's a strong 10% performance improvement.

Copy link

Contributor

gzm0 commentedJul 9, 2025

My thoughts on this based on the PR description and a very brief look at the code:

We should definitely do this. Much cleaner from so many POVs (and my attempts to make the IR checker work with aPreTransCast went nowhere so far ;-) ).
I agree with the sentiment that the "pack" operation should remain linker internal. As an alternative to the desugarer, have you considered patching the IR ininjectedIRFiles of the Emitter?
I do not feel great about the$resHi return value solution. I suspect it's our least bad alternative, but some more evidence would be nice. Things to consider from the top of my head:
- Fixed size 2 typed array
- Destructuring assignment: IIUC a JS JIT could optimize this, but it feels unlikely: I'd expect the destructuring assignment to be desugared into a lower representation early in the JIT pipeline.

Copy link

MemberAuthor

sjrd commentedJul 9, 2025

I agree with the sentiment that the "pack" operation should remain linker internal. As an alternative to the desugarer, have you considered patching the IR ininjectedIRFiles of the Emitter?

I have considered it, and it's still in the cards. The only downside, AFAICT, is that theClassDefChecker must then let it pass, even in the "initial" configuration that is available to user space. They'd have to really work for it, though, so it's probably fine.

I do not feel great about the$resHi return value solution. I suspect it's our least bad alternative, but some more evidence would be nice. Things to consider from the top of my head:
Fixed size 2 typed array
Destructuring assignment: IIUC a JS JIT could optimize this, but it feels unlikely: I'd expect the destructuring assignment to be desugared into a lower representation early in the JIT pipeline.

These are two of the options I'm considering. We'll have to measure which is faster.$resHi was the easiest to implement, precisely because of the evaluation order trick that I suspect GCC is not happy about. All the other alternatives will require more work in the function emitter to forcibly unnest alllong expressions.

sjrd force-pushed thert-long-expanded branch fromf85b1be to704d99cCompare

July 9, 2025 13:56

sjrd force-pushed thert-long-expanded branch from704d99c to32c14ddCompare

July 17, 2025 11:10

WiP Redesign the encoding of Longs in JavaScript.

d26f4c0

As a nice bonus, the IR checker now passes with `RuntimeLong`.

sjrd force-pushed thert-long-expanded branch from32c14dd tod26f4c0Compare

July 17, 2025 17:21

Alternative scheme for resHi as an Int32Array of size 1.

17ff9fc

Sadly, that doesn't fix the GCC issue.It may have performance, especially in multi-module outputs, as wedon't need to mutate a foreign var.

Labels

None yet

2 participants

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WiP Redesign the encoding of Longs in JavaScript.#5205

Are you sure you want to change the base?

WiP Redesign the encoding of Longs in JavaScript.#5205

Uh oh!

Conversation

sjrd commentedJun 25, 2025•
edited
Loading

Uh oh!

Uh oh!

sjrd commentedJun 28, 2025

Uh oh!

sjrd commentedJul 5, 2025

Uh oh!

gzm0 commentedJul 9, 2025

Uh oh!

sjrd commentedJul 9, 2025

Uh oh!

Uh oh!

Movatterモバイル変換

WiP Redesign the encoding of Longs in JavaScript.#5205

Are you sure you want to change the base?

WiP Redesign the encoding of Longs in JavaScript.#5205

Uh oh!

Conversation

sjrd commentedJun 25, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

sjrd commentedJun 28, 2025

Uh oh!

sjrd commentedJul 5, 2025

Uh oh!

gzm0 commentedJul 9, 2025

Uh oh!

sjrd commentedJul 9, 2025

Uh oh!

Uh oh!

sjrd commentedJun 25, 2025•
edited
Loading