Skip to content

ZJIT: Add MemBase::FrameBase that deals with native SP changes (+2) #14009

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

XrXr
Copy link
Member

@XrXr XrXr commented Jul 24, 2025

This solves an immediate problem with def a(n1,n2,n3,n4,n5,n6,n7,n8) = [n8] and gen_new_array(). In the prior, it miscomps as follows:

; call rb_ary_new_capa
0x90: mov x0, #1
0x94: mov x16, #0x1278
0x98: movk x16, #0x4bc, lsl #16
0x9c: movk x16, #1, lsl #32
0xa0: blr x16
; call rb_ary_push
0xa4: mov x1, x0
0xa8: str x1, [sp, #-0x10]! ; c_push() from alloc_reg() to preserve the array past the call below
0xac: mov x0, x1            ; arg0, the array
0xb0: ldur x1, [sp]         ; arg1, n8, but we've just moved sp and now that refers to the array!
0xb4: mov x16, #0x3968
0xb8: movk x16, #0x4bc, lsl #16
0xbc: movk x16, #1, lsl #32
0xc0: blr x16
0xc4: ldr x1, [sp], #0x10

The problem here is that n8 is assigned a fixed offset from sp in codegen.rs, but the backend can move sp. To solve this, this diff adds MemBase::FrameBase, which keeps track of pushes and pops so code can refer to the pre-modification sp (base sp, as I call it in the diff). The tracking has to be done late in the backend because the backend inserts pushes and pops, like we see here.

This will also be useful for things like concatstrings, where we want to generate LIR that pushes onto the native stack and still refer to other on stack elements afterwards.

Now, I'm not a huge fan of this because this feels like a leaky abstraction. It can only track compile-time known modifications to SP like stack pushes and pops. Dynamic stack space allocation will silently miscomp. The tracking is also sensitive to the sequence of LIR that ends up modifying SP, so compile time knowable modifications might not always be properly tracked if they're too indirect.

What to do then?

Some quick possible alternatives:

  • Don't ever move SP inside the body of functions. This implies tallying all the stack usages up front in the backend and reserving that amount in the prologue. For cases like this one where alloc_reg() needs to preserve values, it would store directly to the preallocated space instead of using push/pop. Also implies banning dynamic stack space allocation using true runtime numbers.
  • Detect this pitfall by panicking whenever the backend sees push/pop, and also, panic on attempts to hold values across calls. This makes codegen hard to correctly write. Probably too hard.

I'd like your opinion on this.

XrXr added 3 commits July 23, 2025 22:42
Keeping the same name makes re-exporting more concise.
This is to fix `def a(n1,n2,n3,n4,n5,n6,n7,n8) = [n8]`, and for future
changes that push values onto the native stack. Before this, the
backend automatically preserved the array in gen_new_array() with an
inserted `asm.c_push()`, which invalidated the static memory offset
based on native SP for `n8`.
@matzbot matzbot requested a review from a team July 24, 2025 17:43
@tekknolagi
Copy link
Contributor

I like high watermark and bump SP (with RBP) approach personally

@k0kubun
Copy link
Member

k0kubun commented Jul 24, 2025

Also implies banning dynamic stack space allocation using true runtime numbers

If we always allocate iseq->body->stack_max slots on the stack, would it always guarantee that there's enough space for processing gen_new_array() or concatstrings?

I like high watermark and bump SP (with RBP) approach personally

The problem Alan is talking about is that the backend moves SP on its own for pushing live registers on C calls. To address that with a high watermark, you need to know the maximum number of live registers on C calls. It's impossible to know as of setting up a frame, so I guess we'd have to always bump SP by the number of allocatable registers.

I wonder how much stack space we can safely waste for each frame.

@XrXr
Copy link
Member Author

XrXr commented Jul 24, 2025

I like high watermark and bump SP (with RBP) approach personally

Ah yes, that's another approach. If we're willing to preserve RBP on x86 (we don't currently) then we can push and pop and refer to things using a static offset based on RBP. On ARM we already always have an RBP equivalent because macOS requires it.

@k0kubun
Copy link
Member

k0kubun commented Jul 24, 2025

Sorry for misunderstanding the idea. So... I guess we'll tweak the FrameSetup for x86_64 and switch to RBP-based operands then?

Copy link

Tests Failed

✖️no tests failed ✔️62162 tests passed(1 flake)

@XrXr
Copy link
Member Author

XrXr commented Jul 24, 2025

I guess we'll tweak the FrameSetup for x86_64 and switch to RBP-based operands then?

Yes that's the idea. Seems like the best option for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants