POC: Configurable `Instruction` autogen #6316

ShaharNaveh · 2025-11-30T20:08:50Z

Note

This is very much at a POC stage ATM

The idea here is to auto-generate as much as possible, whether it's oparg field name (namei/consti/target/etc), the oparg type, oparg ID and opcode name.
+
An important feature using the config file is to have the ability to control every attribute in every opcode, we can exclude opcodes that are not supported yet or add new ones that do not appear in the targeted CPython version (not supported yet, but planning to add it).

This will ensure consistency for the enum variant.

lmk what you think:)

cc: @coolreader18 @youknowone

Summary by CodeRabbit

New Features
- Enhanced bytecode instruction support with improved opcode handling and generation capabilities.
Chores
- Integrated Python 3.13.9 compatibility and updated build tooling infrastructure.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-11-30T20:09:11Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Adds infrastructure for auto-generating Rust bytecode instruction enums from CPython opcode definitions. Changes include a Git submodule pointing to CPython 3.13.9, a TOML configuration file specifying opcode variants, a Python generation script that parses the configuration and synthesizes Rust code, and a refactored bytecode module with the generated instruction enum.

Changes

Cohort / File(s)	Summary
Git infrastructure `.gitmodules`, `submodules/cpython-v3.13.9`	Adds Git submodule entry for CPython v3.13.9 repository and updates submodule pointer.
Configuration and generation `crates/compiler-core/instructions.toml`, `crates/compiler-core/generate.py`	Introduces TOML configuration file defining opcode variants with metadata (name, oparg properties, type mappings) and marks disabled opcodes. Adds Python generation script that parses the configuration, analyzes CPython instruction data, generates Rust enum variants, and formats output via rustfmt.
Rust bytecode module `crates/compiler-core/src/bytecode.rs`, `crates/compiler-core/src/bytecode/instruction.rs`	Refactors bytecode module to declare a separate `instruction` submodule and re-exports public `Instruction` type. Removes in-file enum definition. Adds auto-generated `Instruction` enum in new `instruction.rs` with `#[repr(u8)]`, deriving `Clone, Copy, Debug, Eq, PartialEq`, comprising comprehensive opcode variants with associated data fields using typed argument wrappers.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

crates/compiler-core/generate.py: Verify parsing logic, TOML deserialization, CPython data analysis, and Rust code synthesis correctness; ensure casing conversions and rustfmt integration work as intended.
crates/compiler-core/instructions.toml: Validate opcode names, discriminant mappings, oparg type definitions, and consistency with CPython 3.13.9 specification; confirm disabled opcodes list is intentional.
crates/compiler-core/src/bytecode.rs and instruction.rs: Confirm structural refactoring is complete; verify all existing references to Instruction still resolve correctly; check enum discriminant values align with CPython bytecode semantics.
submodules/cpython-v3.13.9: Confirm commit hash is correct and accessible.

Poem

🐰 Bytecode opcodes dancing free,
Generated from TOML and CPython spree,
From Python scripts to Rust so bright,
Instructions manifest—pure delight! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'POC: Configurable `Instruction` autogen' accurately describes the main change: introducing a proof-of-concept for auto-generating Instruction definitions with configurable opcodes via a new TOML configuration file.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

crates/compiler-core/src/bytecode.rs (1)

1163-1183: Generated Instruction enum incompatible with existing impl blocks.

The label_arg() method references variants and field names that don't exist in the generated instruction.rs:

JumpIfNotExcMatch, JumpIfTrueOrPop, JumpIfFalseOrPop, SetupExcept, SetupAsyncWith, Break, Continue - not in generated enum

Field names like target, handler, end don't match the generated fields (e.g., delta is used in generated code)

PopJumpIfTrue { target: l } but generated has PopJumpIfTrue { delta: Arg<Label> }

This pattern extends throughout stack_effect() (lines 1220-1344) and fmt_dis() (lines 1413-1522), which reference ~50+ variants not present in the generated enum.

Since this is a POC, consider either:

Keeping the old Instruction enum alongside the generated one until feature parity is reached

Updating the generator to produce all required variants and field names

Adding a note in the PR description about the current compilation status

🧹 Nitpick comments (1)

crates/compiler-core/src/bytecode.rs (1)
15-17: Module declaration order is unconventional.

The re-export (pub use) appears before the module declaration (mod instruction). While this compiles, the conventional order is to declare modules first, then re-export from them.
-pub use crate::bytecode::instruction::Instruction;
-
 mod instruction;
+
+pub use crate::bytecode::instruction::Instruction;

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8e0a86d and ce01644.

📒 Files selected for processing (6)

.gitmodules (1 hunks)
crates/compiler-core/generate.py (1 hunks)
crates/compiler-core/instructions.toml (1 hunks)
crates/compiler-core/src/bytecode.rs (1 hunks)
crates/compiler-core/src/bytecode/instruction.rs (1 hunks)
submodules/cpython-v3.13.9 (1 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.py: In most cases, Python code should not be edited; bug fixes should be made through Rust code modifications only
Follow PEP 8 style for custom Python code
Use ruff for linting Python code

Files:

crates/compiler-core/generate.py

**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

crates/compiler-core/src/bytecode/instruction.rs
crates/compiler-core/src/bytecode.rs

🪛 Flake8 (7.3.0)

crates/compiler-core/generate.py

[error] 137-137: undefined name 'analyser'

(F821)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Run tests under miri
GitHub Check: Run snippets and cpython tests (windows-latest)
GitHub Check: Run rust tests (windows-latest)
GitHub Check: Check Rust code with clippy
GitHub Check: Ensure compilation on various targets

🔇 Additional comments (6)

submodules/cpython-v3.13.9 (1)

1-1: No changes needed to submodule configuration.

The commit hash 8183fa5e3f78ca6ab862de7fb8b14f3d929421e0 is the official CPython 3.13.9 release commit, and pinning to it is an explicit, stable approach. Shallow cloning is appropriate here since code generation typically requires only the current state of opcode definitions, not repository history.

crates/compiler-core/instructions.toml (2)

17-23: LGTM!

The active opcode configurations with overrides are well-documented. The use of r#where for the Rust keyword escape and the comment explaining the 3.13 oparg change for YIELD_VALUE are helpful.

25-61: LGTM!

The disabled opcodes section appropriately excludes specialized optimization variants and instrumentation opcodes that are not yet implemented. This is a reasonable approach for the POC stage.

crates/compiler-core/generate.py (2)

33-66: LGTM!

The Inst class properly handles override configurations and falls back to analyzed properties. The variant generation correctly produces Rust enum syntax with optional oparg fields.

68-100: TODOs noted for future improvement.

The Oparg.from_properties method has several TODO comments for adding proper types (ConstIdx for uses_co_consts, etc.) and raising errors instead of falling back to u32. These are appropriate for a POC but should be tracked for completion before this feature is fully integrated.

Consider opening issues to track:

Replace u32 fallback with proper ConstIdx type for uses_co_consts

Add proper type for uses_locals

Raise error instead of silent u32 fallback in the else branch

crates/compiler-core/src/bytecode/instruction.rs (1)

168-169: Verify unit variants for StoreAttr* instructions.

StoreAttrInstanceValue and StoreAttrSlot are defined as unit variants (no oparg), while similar LoadAttr* variants (lines 103, 111) have idx: Arg<u32> fields. This asymmetry may be intentional based on CPython's opcode definitions, but should be verified.

.gitmodules

coderabbitai · 2025-11-30T20:14:36Z

crates/compiler-core/generate.py

+def get_analysis() -> analyser.Analysis:
+    analysis = analyzer.analyze_files([DEFAULT_INPUT])


⚠️ Potential issue | 🟡 Minor

Typo in type annotation: analyser should be analyzer.

The function's return type annotation uses analyser.Analysis but the import on line 29 is analyzer. This will cause a NameError at runtime if type hints are evaluated.

-def get_analysis() -> analyser.Analysis: +def get_analysis() -> analyzer.Analysis:

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def get_analysis() -> analyser.Analysis:

analysis = analyzer.analyze_files([DEFAULT_INPUT])

def get_analysis() -> analyzer.Analysis:

analysis = analyzer.analyze_files([DEFAULT_INPUT])

🧰 Tools

🪛 Flake8 (7.3.0)

[error] 137-137: undefined name 'analyser'

(F821)

🤖 Prompt for AI Agents

In crates/compiler-core/generate.py around lines 137 to 138, the return type annotation mistakenly references analyser.Analysis while the module is imported as analyzer; update the annotation to analyzer.Analysis to match the import (or change the import to analyser if that was intended) so the type name resolves correctly; ensure any other occurrences use the same identifier to avoid NameError.

coderabbitai · 2025-11-30T20:14:36Z

crates/compiler-core/generate.py

+    # We don't differentiate between real and pseudos yet
+    analysis.instructions |= analysis.pseudos
+    return analysis


⚠️ Potential issue | 🔴 Critical

#[repr(u8)] incompatible with pseudo-instruction IDs.

Line 141 merges analysis.pseudos into analysis.instructions, but pseudo-instructions have IDs > 255 (e.g., Jump = 256). Combined with the hardcoded #[repr(u8)] on line 150, this produces an enum that won't compile.

Either:

Filter out instructions with IDs > 255:

if opcodes.get(name, {}).get("enabled", True) and analysis.opmap.get(name, 0) <= 255

Or change the repr based on max ID:

max_id = max(inst.id for inst in instructions) repr_type = "u16" if max_id > 255 else "u8" # Then use {repr_type} in the template

Also applies to: 148-151

🤖 Prompt for AI Agents

crates/compiler-core/generate.py around lines 140-142 and 148-151: after merging analysis.pseudos into analysis.instructions the enum is generated with a hardcoded #[repr(u8)] which fails if any instruction id > 255 (pseudo IDs like 256); fix by computing the maximum instruction id after the merge (e.g., max_id = max(inst.id for inst in analysis.instructions)) and choose repr_type = "u16" if max_id > 255 else "u8", then inject {repr_type} into the template instead of the hardcoded u8; alternatively, if you prefer to exclude pseudos from the enum, filter analysis.pseudos when merging (only include ops with enabled and op id <= 255) before generating the enum.

coderabbitai · 2025-11-30T20:14:37Z

crates/compiler-core/generate.py

+    output = rustfmt(
+        f"""
+    // This file is generated by {script_path}
+    // Do not edit!
+
+    use crate::bytecode::{{Arg, Label, NameIdx}};
+
+    {generated}
+    """
+    )


⚠️ Potential issue | 🟠 Major

Generated imports are incomplete.

The import statement on line 181 only includes Arg, Label, NameIdx, but the generated enum uses additional types from the bytecode module:

IntrinsicFunction1 (used in CallIntrinsic1)

IntrinsicFunction2 (used in CallIntrinsic2)

ResumeKind (used in Resume)

The generator should dynamically collect all oparg types used by instructions and include them in the import:

+def collect_oparg_types(instructions: list[Inst]) -> set[str]: + types = set() + for inst in instructions: + if inst.has_oparg and inst.oparg_typ.name not in ("u32",): + types.add(inst.oparg_typ.name) + return types def main(): # ... existing code ... + oparg_types = collect_oparg_types(instructions) + imports = ", ".join(sorted(["Arg", "Label", "NameIdx"] + list(oparg_types))) output = rustfmt( f""" // This file is generated by {script_path} // Do not edit! - use crate::bytecode::{{Arg, Label, NameIdx}}; + use crate::bytecode::{{{imports}}}; {generated} """ )

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai · 2025-11-30T20:14:37Z

crates/compiler-core/instructions.toml

+# The script will try to detect the properties for each opcode, but it can be useful to override some prooerties. So Ideally,
+# we want to have this file empty because that means that we are fully aligned with CPython's opcodes.
+#
+# enabled : bool
+#   When false the opcode will not apear as a variant.


⚠️ Potential issue | 🟡 Minor

Fix typos in documentation comments.

There are two typos in the header comments:

Line 3: "prooerties" → "properties"

Line 7: "apear" → "appear"

-# The script will try to detect the properties for each opcode, but it can be useful to override some prooerties. So Ideally, +# The script will try to detect the properties for each opcode, but it can be useful to override some properties. So ideally, # we want to have this file empty because that means that we are fully aligned with CPython's opcodes. # # enabled : bool -# When false the opcode will not apear as a variant. +# When false the opcode will not appear as a variant.

🤖 Prompt for AI Agents

crates/compiler-core/instructions.toml lines 3-7: fix typos in the header comment by replacing "prooerties" with "properties" and "apear" with "appear" so the documentation reads correctly.

coderabbitai · 2025-11-30T20:14:37Z

crates/compiler-core/src/bytecode/instruction.rs

+// This file is generated by crates/compiler-core/generate.py
+// Do not edit!
+
+use crate::bytecode::{Arg, Label, NameIdx};
+
+/// A Single bytecode instruction.
+#[repr(u8)]
+#[derive(Clone, Copy, Debug, Eq, PartialEq)]
+pub enum Instruction {


⚠️ Potential issue | 🔴 Critical

Missing imports for oparg types used in enum variants.

The generated code uses IntrinsicFunction1, IntrinsicFunction2, and ResumeKind as generic type parameters for Arg<T> (lines 32-33, 154), but these types are not imported. The import statement only includes Arg, Label, and NameIdx.

The generator script should be updated to include these types in the import statement:

-use crate::bytecode::{Arg, Label, NameIdx}; +use crate::bytecode::{Arg, IntrinsicFunction1, IntrinsicFunction2, Label, NameIdx, ResumeKind};

Or the generator should dynamically determine which oparg types are used and include them in the imports.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// This file is generated by crates/compiler-core/generate.py

// Do not edit!

use crate::bytecode::{Arg, Label, NameIdx};

/// A Single bytecode instruction.

#[repr(u8)]

#[derive(Clone, Copy, Debug, Eq, PartialEq)]

pub enum Instruction {

// This file is generated by crates/compiler-core/generate.py

// Do not edit!

use crate::bytecode::{Arg, IntrinsicFunction1, IntrinsicFunction2, Label, NameIdx, ResumeKind};

/// A Single bytecode instruction.

#[repr(u8)]

#[derive(Clone, Copy, Debug, Eq, PartialEq)]

pub enum Instruction {

🤖 Prompt for AI Agents

In crates/compiler-core/src/bytecode/instruction.rs around lines 1 to 9, the generated file imports only Arg, Label, and NameIdx but the enum uses oparg types IntrinsicFunction1, IntrinsicFunction2, and ResumeKind as generic parameters for Arg later in the file; update the generator so the use statement also imports those types (or make the generator detect which oparg types are referenced and include them) so the produced file contains a single import including Arg, Label, NameIdx, IntrinsicFunction1, IntrinsicFunction2, and ResumeKind.

coderabbitai · 2025-11-30T20:14:37Z

crates/compiler-core/src/bytecode/instruction.rs

+#[repr(u8)]
+#[derive(Clone, Copy, Debug, Eq, PartialEq)]


⚠️ Potential issue | 🔴 Critical

#[repr(u8)] incompatible with discriminant values exceeding 255.

The enum is declared with #[repr(u8)] but contains multiple variants with discriminant values that exceed the u8 range (0-255):

Jump { target: Arg<Label> } = 256

JumpNoInterrupt { target: Arg<Label> } = 257

LoadClosure { i: Arg<u32> } = 258

LoadMethod { namei: Arg<NameIdx> } = 259

LoadSuperMethod { namei: Arg<NameIdx> } = 260

LoadZeroSuperAttr { namei: Arg<NameIdx> } = 261

LoadZeroSuperMethod { namei: Arg<NameIdx> } = 262

PopBlock = 263

SetupCleanup = 264

SetupFinally = 265

SetupWith = 266

StoreFastMaybeNull { idx: Arg<u32> } = 267

This will fail to compile. The generator should either use #[repr(u16)] or filter out pseudo-instructions with IDs > 255.

Also applies to: 92-96, 114-114, 127-127, 132-134, 143-143, 164-166, 174-174

🤖 Prompt for AI Agents

In crates/compiler-core/src/bytecode/instruction.rs around lines 7-8 (and also affecting the variant ranges listed at 92-96, 114, 127, 132-134, 143, 164-166, 174), the enum is annotated #[repr(u8)] but contains discriminant values >255 which is invalid; change the representation to #[repr(u16)] or otherwise ensure generated discriminant values fit in u8 (e.g., filter/remove pseudo-instructions with IDs >255) so the repr matches the largest discriminant; update the attribute to #[repr(u16)] if you intend to keep those variants, or adjust the generator to cap or exclude IDs above 255 to retain #[repr(u8)].

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

ShaharNaveh added 3 commits November 30, 2025 10:46

Move Instruction to a dedicated file

5fb9cf8

Add cpython 3.13.9 as a submodule

9e82fcb

POC: Configurable autogen instruction

ce01644

ShaharNaveh marked this pull request as draft November 30, 2025 20:08

coderabbitai bot reviewed Nov 30, 2025

View reviewed changes

ShaharNaveh and others added 5 commits December 1, 2025 11:32

Don't generate instructions with id over 255

782b28e

Refactor code to use dataclasses

8ef2e60

Fix duplicate ids

8afef74

More conf

60f227e

Apply suggestion from @coderabbitai[bot]

b59d783

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

ShaharNaveh requested review from coolreader18 and youknowone December 2, 2025 15:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

POC: Configurable `Instruction` autogen #6316

POC: Configurable `Instruction` autogen #6316

Uh oh!

ShaharNaveh commented Nov 30, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 30, 2025 •

edited

Loading

Review skipped

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot Nov 30, 2025

Uh oh!

coderabbitai bot Nov 30, 2025

Uh oh!

coderabbitai bot Nov 30, 2025

Uh oh!

coderabbitai bot Nov 30, 2025

Uh oh!

coderabbitai bot Nov 30, 2025

Uh oh!

coderabbitai bot Nov 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		def get_analysis() -> analyser.Analysis:
		analysis = analyzer.analyze_files([DEFAULT_INPUT])

POC: Configurable Instruction autogen #6316

Are you sure you want to change the base?

POC: Configurable Instruction autogen #6316

Uh oh!

Conversation

ShaharNaveh commented Nov 30, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

POC: Configurable `Instruction` autogen #6316

POC: Configurable `Instruction` autogen #6316

ShaharNaveh commented Nov 30, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 30, 2025 •

edited

Loading