-
-
Notifications
You must be signed in to change notification settings - Fork 32.5k
GH-136410: Faster side exits #136411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
GH-136410: Faster side exits #136411
Conversation
// from being immediately detected as cold and invalidated. | ||
cold->vm_data.warm = true; | ||
if (_PyJIT_Compile(cold, cold->trace, 1)) { | ||
Py_DECREF(cold); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's immortal. Decrefing won't do anything. I think you mean to call the dealloc function on it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot. I think I need to move making it immortal to after it is fully created.
@pablogsal any ideas what this failure means? https://github.com/python/cpython/actions/runs/16140914160/job/45547999926?pr=136411 |
Is a race condition in the tests that will be fixed by #136347 |
Performance is in the noise:
However, I wouldn't expect much of a speedup by reducing the size of exits, so this seems reasonable. |
I think the benchmarking public mirror is down. Can you please share the link when it comes up again? |
Note to self: The cold exit executor should be freed when freeing the interpreter to avoid leaking memory. |
executor->exits[i].temperature = initial_temperature_backoff_counter(); | ||
executor->exits[i].executor = cold; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We put cold_executor
here without INCREF because cold_executor
is immortal. But in executor_clear
we DECREF each executor from exits
because there can be other executors as well. But this is imbalance of INCREF/DECREF. Maybe it is worth to add a comment here that we omit INCREF because of immortality of cold executor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of notes. I like the general idea, not a huge fan of the jit_exit
side-channel (but I get why we have it).
Py_FatalError("Cannot allocate core JIT code"); | ||
} | ||
#endif | ||
_Py_SetImmortal((PyObject *)cold); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it need to be immortal? This means that we'll leak one of these per interpreter, along with about a page of JIT code. I think the interpreter can just hold a normal reference that we free at shutdown, right?
@@ -584,6 +584,7 @@ translate_bytecode_to_trace( | |||
code->co_firstlineno, | |||
2 * INSTR_IP(initial_instr, code)); | |||
ADD_TO_TRACE(_START_EXECUTOR, 0, (uintptr_t)instr, INSTR_IP(instr, code)); | |||
ADD_TO_TRACE(_CHECK_VALIDITY, 0, 0, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it intended that this will be checked once per loop? JUMP_TO_TOP
goes here.
// This is initialized to true so we can prevent the executor | ||
// from being immediately detected as cold and invalidated. | ||
cold->vm_data.warm = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is needed, since it may be invalidated as "cold" at some point, but we never check its validity anyways. We could maybe not link it at all, but not sure if that breaks some subtle invariant (we have quite a few of those).
} | ||
exit->temperature = initial_temperature_backoff_counter(); | ||
} | ||
assert(tstate->jit_exit == exit); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe set it to NULL
now?
@@ -584,6 +584,7 @@ translate_bytecode_to_trace( | |||
code->co_firstlineno, | |||
2 * INSTR_IP(initial_instr, code)); | |||
ADD_TO_TRACE(_START_EXECUTOR, 0, (uintptr_t)instr, INSTR_IP(instr, code)); | |||
ADD_TO_TRACE(_CHECK_VALIDITY, 0, 0, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I don't think this is enough to replace the condition we had before in _EXIT_TRACE
. If an executor hanging off of a side-exit is invalidated, it won't clear the executor and reset the exit like we did before, right?
When you're done making the requested changes, leave the comment: |
This PR reinstates the
_COLD_EXIT
uop, but this time expects the exit to be passed, not the executor.This way we only need one jitted stub, not hundreds.
Using stubs hugely simplifies
_EXIT_TRACE
as all it needs to do is jump to the exit's executor.The x86-64 stencil for
_EXIT_TRACE
shrinks from 384 bytes to 36 bytes, although it does require one extra_CHECK_VALIDITY
(19 bytes) to be added to each trace.Since traces often contain multiple
_EXIT_TRACE
s this is a substantial space saving.