- Notifications
You must be signed in to change notification settings - Fork1.6k
feat: optimize frame layout for tail-call-only functions#11608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Conversation
Reduce frame size from 16 to 8 bytes for functions that only make tailcalls (FunctionCalls::TailOnly). This optimization:- Uses single register operations (str/ldr fp) instead of pairoperations (stp/ldp fp,lr)- Applies when no other frame requirements exist (no frame pointers,stack args, etc.)- Is instruction-based: functions containing only return_callinstructions get optimized- Maintains ABI compatibility and includes comprehensive test coverage
pnodet commentedSep 4, 2025
@cfallin What do you think of something like this? I only looked into aarch64 for the moment since other ISAs such as x64 s390x looks quite different and more complex to implement. |
cfallin commentedSep 4, 2025
Unfortunately I don't think this is going to work: the stack pointer has to be 16-aligned, and aarch64 will actually trap if memory accesses occur with a misaligned SP. Furthermore the savings I would expect is not "only push FP, not LR", but "don't push anything at all if the frame is zero-size". This should be the case for tail-calling functions with. no stack storage (spillslots, stackslots or clobbers) and no outgoing argument space. |
pnodet commentedSep 4, 2025
Don't debuggers rely on frame pointers for stack traces? Could setting the frame size to 0 hurt debugging/unwinding? |
bjorn3 commentedSep 4, 2025
Debuggers and profilers should handle missing stack frames for leaf functions already. And besides debuggers actually generally use .eh_frame for stack unwinding, only falling back to frame pointers when .eh_frame is not available. |
cfallin commentedSep 4, 2025
Right -- we already omit frame pointers for functions that are truly leaf functions (no calls at all, with no frame storage); this is a common optimization. In Wasmtime, where we use our own stack-walking logic and unwinder and want simplicity/robustness, we configure Cranelift never to omit frame pointers; so this optimization largely applies to other uses of Cranelift, like bjorn3's |
pnodet commentedSep 4, 2025
Then could it be safe to have something like this? // Compute linkage frame size. let setup_area_size = if flags.preserve_frame_pointers() // The function arguments that are passed on the stack are addressed // relative to the Frame Pointer. || flags.unwind_info() || incoming_args_size > 0 || clobber_size > 0 || fixed_frame_storage_size > 0 { 16 // FP, LR } else { match function_calls { FunctionCalls::Regular => 16, FunctionCalls::None => 0,- FunctionCalls::TailOnly => 8,+ FunctionCalls::TailOnly => 0, } }; |
cfallin commentedSep 4, 2025
I think you'll want to check the tail args and outgoing args size as well (the other parameters to |
Reduce frame size from 16 to 8 bytes for functions that only make tail calls (FunctionCalls::TailOnly). This optimization: