Skip to content

WiP Redesign the encoding of Longs in JavaScript. #5205

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

sjrd
Copy link
Member

@sjrd sjrd commented Jun 25, 2025

Overall, we make the flat representation of longs as a pair of (lo, hi) the representation, at the Emitter level. There are no more instances of RuntimeLong. The emitter flattens out all the Longs as follows:

  • A local variable of type long becomes two local variables of type int.
  • A field of type long becomes two fields of type int.
  • An Array[Long] is stored as an Int32Array with twice as many elements, alternating lo and hi words.
  • Method parameters of type long are expanded as two parameters of type int.

For the result of method parameters, there is a trick. That one is the most "debatable", in that I think there are contending alternatives that may be faster. When a method returns a Long, it stores the hi word in a global variable $resHi, then returns the lo word. At call site, we read back the $resHi global. We used a similar trick for non-inlined methods of RuntimeLong, with a var resultHi field of object RuntimeLong. Now this is moved to the emitter to take care of.

All the methods of RuntimeLong explicitly take "expanded" versions of their parameters: abs takes two parameters of type Int; add takes 4. Shifts take 3 parameters of type Int: the lo, the hi, and the shift amount. The result, however, is a Long.

In order to allow them to construct Long results from their words, we introduce a magic method RuntimeLong.pack(lo, hi), whose body is filled in by the Desugarer with a special Transient(PackLong(lo, hi)). It cannot be the compiler, because we cannot serialize transients. And it cannot wait for the emitter, because the optimizer definitely wants to see the PackLongs to unpack them. An alternative would be to introduce a new BinaryOp, but I think that's worse because it bakes an implementation detail of the emitter into the IR. The fact that we can even do this PR is a testament to the current abstraction level of our IR.

These changes significantly speed up even the SHA512 benchmark, even though it performs most computations on stack already anyway (the improvements must come from the arrays, in this case). I haven't measured benchmarks that extensively use Long fields yet, but I expect them to get dramatic speedups.

For the optimizer, this makes things a lot simpler. Instead of having special-cases for RuntimeLong everywhere, we basically have a unique withSplitLong method to deal with them. That method can split one PreTransform of type long into two PreTransforms of type ints, so that they can be given to the inlined methods of RuntimeLong. We introduce a new LongPairReplacement for LocalDefs that aggregate a pair of (lo, hi) LocalDefs (typically the result of a split PackLong). As a nice bonus, the IR checker now passes with RuntimeLong after the optimizer.


One big caveat for now: Closure breaks the new encoding, sometimes. I think it gets confused by the $resHi variable and evaluation order of function params. For example if we pass the result of a Long method to a Long parameter, we emit

x.foo(y.bar(1), $resHi);

During y.bar(1), the method modifies $resHi. It is then read right after to be passed as second argument to x.foo.

@sjrd sjrd force-pushed the rt-long-expanded branch 6 times, most recently from 7161430 to 2fe28e7 Compare June 28, 2025 11:09
@sjrd
Copy link
Member Author

sjrd commented Jun 28, 2025

@gzm0 This is not yet ready for a code review. However, it is a good time to get your opinion on the overall new compilation scheme and architectural changes. WDYT?

@sjrd sjrd force-pushed the rt-long-expanded branch 2 times, most recently from f35c5dd to 3e51123 Compare June 28, 2025 15:12
@sjrd sjrd force-pushed the rt-long-expanded branch from 3e51123 to f85b1be Compare July 5, 2025 20:43
@sjrd
Copy link
Member Author

sjrd commented Jul 5, 2025

Some results from the bounce benchmark, where I have replaced the PRNG by a ju.SplittableRandom. Our SplittableRandom was not specially crafted to optimize its Long usages (unlike ju.Random), so it is probably characteristic of "everyday" Long manipulations.

Config Before After
default fastLink 38.5 37.0
optimized semantics 37.3 33.6
opt sems + minify 37.3 33.4

"Before" is #5204, and "After" is this PR.

It's not dramatic, but it's a strong 10% performance improvement.

@gzm0
Copy link
Contributor

gzm0 commented Jul 9, 2025

My thoughts on this based on the PR description and a very brief look at the code:

  • We should definitely do this. Much cleaner from so many POVs (and my attempts to make the IR checker work with a PreTransCast went nowhere so far ;-) ).
  • I agree with the sentiment that the "pack" operation should remain linker internal. As an alternative to the desugarer, have you considered patching the IR in injectedIRFiles of the Emitter?
  • I do not feel great about the $resHi return value solution. I suspect it's our least bad alternative, but some more evidence would be nice. Things to consider from the top of my head:
    • Fixed size 2 typed array
    • Destructuring assignment: IIUC a JS JIT could optimize this, but it feels unlikely: I'd expect the destructuring assignment to be desugared into a lower representation early in the JIT pipeline.

@sjrd
Copy link
Member Author

sjrd commented Jul 9, 2025

  • I agree with the sentiment that the "pack" operation should remain linker internal. As an alternative to the desugarer, have you considered patching the IR in injectedIRFiles of the Emitter?

I have considered it, and it's still in the cards. The only downside, AFAICT, is that the ClassDefChecker must then let it pass, even in the "initial" configuration that is available to user space. They'd have to really work for it, though, so it's probably fine.

  • I do not feel great about the $resHi return value solution. I suspect it's our least bad alternative, but some more evidence would be nice. Things to consider from the top of my head:

    • Fixed size 2 typed array
    • Destructuring assignment: IIUC a JS JIT could optimize this, but it feels unlikely: I'd expect the destructuring assignment to be desugared into a lower representation early in the JIT pipeline.

These are two of the options I'm considering. We'll have to measure which is faster. $resHi was the easiest to implement, precisely because of the evaluation order trick that I suspect GCC is not happy about. All the other alternatives will require more work in the function emitter to forcibly unnest all long expressions.

@sjrd sjrd force-pushed the rt-long-expanded branch from f85b1be to 704d99c Compare July 9, 2025 13:56
@sjrd sjrd force-pushed the rt-long-expanded branch 2 times, most recently from 32c14dd to d26f4c0 Compare July 17, 2025 17:21
@sjrd sjrd force-pushed the rt-long-expanded branch from 17ff9fc to 3f29eb1 Compare July 25, 2025 08:48
sjrd added 2 commits July 28, 2025 20:54
As a nice bonus, the IR checker now passes with `RuntimeLong`.
Sadly, that doesn't fix the GCC issue.

It may have performance, especially in multi-module outputs, as we
don't need to mutate a foreign var.
@sjrd sjrd force-pushed the rt-long-expanded branch from 3f29eb1 to 029a6fa Compare July 28, 2025 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants