-
Notifications
You must be signed in to change notification settings - Fork 1.4k
POC: Configurable Instruction autogen
#6316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughAdds infrastructure for auto-generating Rust bytecode instruction enums from CPython opcode definitions. Changes include a Git submodule pointing to CPython 3.13.9, a TOML configuration file specifying opcode variants, a Python generation script that parses the configuration and synthesizes Rust code, and a refactored bytecode module with the generated instruction enum. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/compiler-core/src/bytecode.rs (1)
1163-1183: GeneratedInstructionenum incompatible with existingimplblocks.The
label_arg()method references variants and field names that don't exist in the generatedinstruction.rs:
JumpIfNotExcMatch,JumpIfTrueOrPop,JumpIfFalseOrPop,SetupExcept,SetupAsyncWith,Break,Continue- not in generated enum- Field names like
target,handler,enddon't match the generated fields (e.g.,deltais used in generated code)PopJumpIfTrue { target: l }but generated hasPopJumpIfTrue { delta: Arg<Label> }This pattern extends throughout
stack_effect()(lines 1220-1344) andfmt_dis()(lines 1413-1522), which reference ~50+ variants not present in the generated enum.Since this is a POC, consider either:
- Keeping the old
Instructionenum alongside the generated one until feature parity is reached- Updating the generator to produce all required variants and field names
- Adding a note in the PR description about the current compilation status
🧹 Nitpick comments (1)
crates/compiler-core/src/bytecode.rs (1)
15-17: Module declaration order is unconventional.The re-export (
pub use) appears before the module declaration (mod instruction). While this compiles, the conventional order is to declare modules first, then re-export from them.-pub use crate::bytecode::instruction::Instruction; - mod instruction; + +pub use crate::bytecode::instruction::Instruction;
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
.gitmodules(1 hunks)crates/compiler-core/generate.py(1 hunks)crates/compiler-core/instructions.toml(1 hunks)crates/compiler-core/src/bytecode.rs(1 hunks)crates/compiler-core/src/bytecode/instruction.rs(1 hunks)submodules/cpython-v3.13.9(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.py: In most cases, Python code should not be edited; bug fixes should be made through Rust code modifications only
Follow PEP 8 style for custom Python code
Use ruff for linting Python code
Files:
crates/compiler-core/generate.py
**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.rs: Follow the default rustfmt code style by runningcargo fmtto format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass,pymodule,pyfunction, etc.) when implementing Python functionality in Rust
Files:
crates/compiler-core/src/bytecode/instruction.rscrates/compiler-core/src/bytecode.rs
🪛 Flake8 (7.3.0)
crates/compiler-core/generate.py
[error] 137-137: undefined name 'analyser'
(F821)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Run tests under miri
- GitHub Check: Run snippets and cpython tests (windows-latest)
- GitHub Check: Run rust tests (windows-latest)
- GitHub Check: Check Rust code with clippy
- GitHub Check: Ensure compilation on various targets
🔇 Additional comments (6)
submodules/cpython-v3.13.9 (1)
1-1: No changes needed to submodule configuration.The commit hash
8183fa5e3f78ca6ab862de7fb8b14f3d929421e0is the official CPython 3.13.9 release commit, and pinning to it is an explicit, stable approach. Shallow cloning is appropriate here since code generation typically requires only the current state of opcode definitions, not repository history.crates/compiler-core/instructions.toml (2)
17-23: LGTM!The active opcode configurations with overrides are well-documented. The use of
r#wherefor the Rust keyword escape and the comment explaining the 3.13 oparg change forYIELD_VALUEare helpful.
25-61: LGTM!The disabled opcodes section appropriately excludes specialized optimization variants and instrumentation opcodes that are not yet implemented. This is a reasonable approach for the POC stage.
crates/compiler-core/generate.py (2)
33-66: LGTM!The
Instclass properly handles override configurations and falls back to analyzed properties. The variant generation correctly produces Rust enum syntax with optional oparg fields.
68-100: TODOs noted for future improvement.The
Oparg.from_propertiesmethod has several TODO comments for adding proper types (ConstIdxforuses_co_consts, etc.) and raising errors instead of falling back tou32. These are appropriate for a POC but should be tracked for completion before this feature is fully integrated.Consider opening issues to track:
- Replace
u32fallback with properConstIdxtype foruses_co_consts- Add proper type for
uses_locals- Raise error instead of silent
u32fallback in the else branchcrates/compiler-core/src/bytecode/instruction.rs (1)
168-169: Verify unit variants forStoreAttr*instructions.
StoreAttrInstanceValueandStoreAttrSlotare defined as unit variants (no oparg), while similarLoadAttr*variants (lines 103, 111) haveidx: Arg<u32>fields. This asymmetry may be intentional based on CPython's opcode definitions, but should be verified.
| def get_analysis() -> analyser.Analysis: | ||
| analysis = analyzer.analyze_files([DEFAULT_INPUT]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in type annotation: analyser should be analyzer.
The function's return type annotation uses analyser.Analysis but the import on line 29 is analyzer. This will cause a NameError at runtime if type hints are evaluated.
-def get_analysis() -> analyser.Analysis:
+def get_analysis() -> analyzer.Analysis:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def get_analysis() -> analyser.Analysis: | |
| analysis = analyzer.analyze_files([DEFAULT_INPUT]) | |
| def get_analysis() -> analyzer.Analysis: | |
| analysis = analyzer.analyze_files([DEFAULT_INPUT]) |
🧰 Tools
🪛 Flake8 (7.3.0)
[error] 137-137: undefined name 'analyser'
(F821)
🤖 Prompt for AI Agents
In crates/compiler-core/generate.py around lines 137 to 138, the return type
annotation mistakenly references analyser.Analysis while the module is imported
as analyzer; update the annotation to analyzer.Analysis to match the import (or
change the import to analyser if that was intended) so the type name resolves
correctly; ensure any other occurrences use the same identifier to avoid
NameError.
| # We don't differentiate between real and pseudos yet | ||
| analysis.instructions |= analysis.pseudos | ||
| return analysis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[repr(u8)] incompatible with pseudo-instruction IDs.
Line 141 merges analysis.pseudos into analysis.instructions, but pseudo-instructions have IDs > 255 (e.g., Jump = 256). Combined with the hardcoded #[repr(u8)] on line 150, this produces an enum that won't compile.
Either:
- Filter out instructions with IDs > 255:
if opcodes.get(name, {}).get("enabled", True) and analysis.opmap.get(name, 0) <= 255- Or change the repr based on max ID:
max_id = max(inst.id for inst in instructions)
repr_type = "u16" if max_id > 255 else "u8"
# Then use {repr_type} in the templateAlso applies to: 148-151
🤖 Prompt for AI Agents
crates/compiler-core/generate.py around lines 140-142 and 148-151: after merging
analysis.pseudos into analysis.instructions the enum is generated with a
hardcoded #[repr(u8)] which fails if any instruction id > 255 (pseudo IDs like
256); fix by computing the maximum instruction id after the merge (e.g., max_id
= max(inst.id for inst in analysis.instructions)) and choose repr_type = "u16"
if max_id > 255 else "u8", then inject {repr_type} into the template instead of
the hardcoded u8; alternatively, if you prefer to exclude pseudos from the enum,
filter analysis.pseudos when merging (only include ops with enabled and op id <=
255) before generating the enum.
| output = rustfmt( | ||
| f""" | ||
| // This file is generated by {script_path} | ||
| // Do not edit! | ||
|
|
||
| use crate::bytecode::{{Arg, Label, NameIdx}}; | ||
|
|
||
| {generated} | ||
| """ | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generated imports are incomplete.
The import statement on line 181 only includes Arg, Label, NameIdx, but the generated enum uses additional types from the bytecode module:
IntrinsicFunction1(used inCallIntrinsic1)IntrinsicFunction2(used inCallIntrinsic2)ResumeKind(used inResume)
The generator should dynamically collect all oparg types used by instructions and include them in the import:
+def collect_oparg_types(instructions: list[Inst]) -> set[str]:
+ types = set()
+ for inst in instructions:
+ if inst.has_oparg and inst.oparg_typ.name not in ("u32",):
+ types.add(inst.oparg_typ.name)
+ return types
def main():
# ... existing code ...
+ oparg_types = collect_oparg_types(instructions)
+ imports = ", ".join(sorted(["Arg", "Label", "NameIdx"] + list(oparg_types)))
output = rustfmt(
f"""
// This file is generated by {script_path}
// Do not edit!
- use crate::bytecode::{{Arg, Label, NameIdx}};
+ use crate::bytecode::{{{imports}}};
{generated}
"""
)Committable suggestion skipped: line range outside the PR's diff.
| # The script will try to detect the properties for each opcode, but it can be useful to override some prooerties. So Ideally, | ||
| # we want to have this file empty because that means that we are fully aligned with CPython's opcodes. | ||
| # | ||
| # enabled : bool | ||
| # When false the opcode will not apear as a variant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typos in documentation comments.
There are two typos in the header comments:
- Line 3: "prooerties" → "properties"
- Line 7: "apear" → "appear"
-# The script will try to detect the properties for each opcode, but it can be useful to override some prooerties. So Ideally,
+# The script will try to detect the properties for each opcode, but it can be useful to override some properties. So ideally,
# we want to have this file empty because that means that we are fully aligned with CPython's opcodes.
#
# enabled : bool
-# When false the opcode will not apear as a variant.
+# When false the opcode will not appear as a variant.🤖 Prompt for AI Agents
crates/compiler-core/instructions.toml lines 3-7: fix typos in the header
comment by replacing "prooerties" with "properties" and "apear" with "appear" so
the documentation reads correctly.
| // This file is generated by crates/compiler-core/generate.py | ||
| // Do not edit! | ||
|
|
||
| use crate::bytecode::{Arg, Label, NameIdx}; | ||
|
|
||
| /// A Single bytecode instruction. | ||
| #[repr(u8)] | ||
| #[derive(Clone, Copy, Debug, Eq, PartialEq)] | ||
| pub enum Instruction { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing imports for oparg types used in enum variants.
The generated code uses IntrinsicFunction1, IntrinsicFunction2, and ResumeKind as generic type parameters for Arg<T> (lines 32-33, 154), but these types are not imported. The import statement only includes Arg, Label, and NameIdx.
The generator script should be updated to include these types in the import statement:
-use crate::bytecode::{Arg, Label, NameIdx};
+use crate::bytecode::{Arg, IntrinsicFunction1, IntrinsicFunction2, Label, NameIdx, ResumeKind};Or the generator should dynamically determine which oparg types are used and include them in the imports.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // This file is generated by crates/compiler-core/generate.py | |
| // Do not edit! | |
| use crate::bytecode::{Arg, Label, NameIdx}; | |
| /// A Single bytecode instruction. | |
| #[repr(u8)] | |
| #[derive(Clone, Copy, Debug, Eq, PartialEq)] | |
| pub enum Instruction { | |
| // This file is generated by crates/compiler-core/generate.py | |
| // Do not edit! | |
| use crate::bytecode::{Arg, IntrinsicFunction1, IntrinsicFunction2, Label, NameIdx, ResumeKind}; | |
| /// A Single bytecode instruction. | |
| #[repr(u8)] | |
| #[derive(Clone, Copy, Debug, Eq, PartialEq)] | |
| pub enum Instruction { |
🤖 Prompt for AI Agents
In crates/compiler-core/src/bytecode/instruction.rs around lines 1 to 9, the
generated file imports only Arg, Label, and NameIdx but the enum uses oparg
types IntrinsicFunction1, IntrinsicFunction2, and ResumeKind as generic
parameters for Arg later in the file; update the generator so the use statement
also imports those types (or make the generator detect which oparg types are
referenced and include them) so the produced file contains a single import
including Arg, Label, NameIdx, IntrinsicFunction1, IntrinsicFunction2, and
ResumeKind.
| #[repr(u8)] | ||
| #[derive(Clone, Copy, Debug, Eq, PartialEq)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[repr(u8)] incompatible with discriminant values exceeding 255.
The enum is declared with #[repr(u8)] but contains multiple variants with discriminant values that exceed the u8 range (0-255):
Jump { target: Arg<Label> } = 256JumpNoInterrupt { target: Arg<Label> } = 257LoadClosure { i: Arg<u32> } = 258LoadMethod { namei: Arg<NameIdx> } = 259LoadSuperMethod { namei: Arg<NameIdx> } = 260LoadZeroSuperAttr { namei: Arg<NameIdx> } = 261LoadZeroSuperMethod { namei: Arg<NameIdx> } = 262PopBlock = 263SetupCleanup = 264SetupFinally = 265SetupWith = 266StoreFastMaybeNull { idx: Arg<u32> } = 267
This will fail to compile. The generator should either use #[repr(u16)] or filter out pseudo-instructions with IDs > 255.
Also applies to: 92-96, 114-114, 127-127, 132-134, 143-143, 164-166, 174-174
🤖 Prompt for AI Agents
In crates/compiler-core/src/bytecode/instruction.rs around lines 7-8 (and also
affecting the variant ranges listed at 92-96, 114, 127, 132-134, 143, 164-166,
174), the enum is annotated #[repr(u8)] but contains discriminant values >255
which is invalid; change the representation to #[repr(u16)] or otherwise ensure
generated discriminant values fit in u8 (e.g., filter/remove pseudo-instructions
with IDs >255) so the repr matches the largest discriminant; update the
attribute to #[repr(u16)] if you intend to keep those variants, or adjust the
generator to cap or exclude IDs above 255 to retain #[repr(u8)].
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Note
This is very much at a POC stage ATM
The idea here is to auto-generate as much as possible, whether it's oparg field name (
namei/consti/target/etc), the oparg type, oparg ID and opcode name.+
An important feature using the config file is to have the ability to control every attribute in every opcode, we can exclude opcodes that are not supported yet or add new ones that do not appear in the targeted CPython version (not supported yet, but planning to add it).
This will ensure consistency for the enum variant.
lmk what you think:)
cc: @coolreader18 @youknowone
Summary by CodeRabbit
New Features
Chores
✏️ Tip: You can customize this high-level summary in your review settings.