Skip to content

Conversation

@ShaharNaveh
Copy link
Collaborator

@ShaharNaveh ShaharNaveh commented Sep 23, 2025

This PR's original goal was to auto generate an enum that will replace:

/// A Single bytecode instruction.
#[derive(Debug, Copy, Clone, PartialEq, Eq)]
#[repr(u8)]
pub enum Instruction {
Nop,
/// Importing by name
ImportName {
idx: Arg<NameIdx>,
},
/// Importing without name
ImportNameless,
/// from ... import ...
ImportFrom {
idx: Arg<NameIdx>,
},
LoadFast(Arg<NameIdx>),
LoadNameAny(Arg<NameIdx>),
LoadGlobal(Arg<NameIdx>),
LoadDeref(Arg<NameIdx>),
LoadClassDeref(Arg<NameIdx>),
StoreFast(Arg<NameIdx>),
StoreLocal(Arg<NameIdx>),
StoreGlobal(Arg<NameIdx>),
StoreDeref(Arg<NameIdx>),
DeleteFast(Arg<NameIdx>),
DeleteLocal(Arg<NameIdx>),
DeleteGlobal(Arg<NameIdx>),
DeleteDeref(Arg<NameIdx>),
LoadClosure(Arg<NameIdx>),
Subscript,
StoreSubscript,
DeleteSubscript,
StoreAttr {
idx: Arg<NameIdx>,
},
DeleteAttr {
idx: Arg<NameIdx>,
},
LoadConst {
/// index into constants vec
idx: Arg<u32>,
},
UnaryOperation {
op: Arg<UnaryOperator>,
},
BinaryOperation {
op: Arg<BinaryOperator>,
},
BinaryOperationInplace {
op: Arg<BinaryOperator>,
},
BinarySubscript,
LoadAttr {
idx: Arg<NameIdx>,
},
TestOperation {
op: Arg<TestOperator>,
},
CompareOperation {
op: Arg<ComparisonOperator>,
},
CopyItem {
index: Arg<u32>,
},
Pop,
Swap {
index: Arg<u32>,
},
ToBool,
Rotate2,
Rotate3,
Duplicate,
Duplicate2,
GetIter,
GetLen,
CallIntrinsic1 {
func: Arg<IntrinsicFunction1>,
},
CallIntrinsic2 {
func: Arg<IntrinsicFunction2>,
},
Continue {
target: Arg<Label>,
},
Break {
target: Arg<Label>,
},
Jump {
target: Arg<Label>,
},
/// Pop the top of the stack, and jump if this value is true.
PopJumpIfTrue {
target: Arg<Label>,
},
/// Pop the top of the stack, and jump if this value is false.
PopJumpIfFalse {
target: Arg<Label>,
},
/// Peek at the top of the stack, and jump if this value is true.
/// Otherwise, pop top of stack.
JumpIfTrueOrPop {
target: Arg<Label>,
},
/// Peek at the top of the stack, and jump if this value is false.
/// Otherwise, pop top of stack.
JumpIfFalseOrPop {
target: Arg<Label>,
},
MakeFunction,
SetFunctionAttribute {
attr: Arg<MakeFunctionFlags>,
},
CallFunctionPositional {
nargs: Arg<u32>,
},
CallFunctionKeyword {
nargs: Arg<u32>,
},
CallFunctionEx {
has_kwargs: Arg<bool>,
},
LoadMethod {
idx: Arg<NameIdx>,
},
CallMethodPositional {
nargs: Arg<u32>,
},
CallMethodKeyword {
nargs: Arg<u32>,
},
CallMethodEx {
has_kwargs: Arg<bool>,
},
ForIter {
target: Arg<Label>,
},
ReturnValue,
ReturnConst {
idx: Arg<u32>,
},
YieldValue,
YieldFrom,
/// Resume execution (e.g., at function start, after yield, etc.)
Resume {
arg: Arg<u32>,
},
SetupAnnotation,
SetupLoop,
/// Setup a finally handler, which will be called whenever one of this events occurs:
/// - the block is popped
/// - the function returns
/// - an exception is returned
SetupFinally {
handler: Arg<Label>,
},
/// Enter a finally block, without returning, excepting, just because we are there.
EnterFinally,
/// Marker bytecode for the end of a finally sequence.
/// When this bytecode is executed, the eval loop does one of those things:
/// - Continue at a certain bytecode position
/// - Propagate the exception
/// - Return from a function
/// - Do nothing at all, just continue
EndFinally,
SetupExcept {
handler: Arg<Label>,
},
SetupWith {
end: Arg<Label>,
},
WithCleanupStart,
WithCleanupFinish,
PopBlock,
Raise {
kind: Arg<RaiseKind>,
},
BuildString {
size: Arg<u32>,
},
BuildTuple {
size: Arg<u32>,
},
BuildTupleFromTuples {
size: Arg<u32>,
},
BuildTupleFromIter,
BuildList {
size: Arg<u32>,
},
BuildListFromTuples {
size: Arg<u32>,
},
BuildSet {
size: Arg<u32>,
},
BuildSetFromTuples {
size: Arg<u32>,
},
BuildMap {
size: Arg<u32>,
},
BuildMapForCall {
size: Arg<u32>,
},
DictUpdate {
index: Arg<u32>,
},
BuildSlice {
/// whether build a slice with a third step argument
step: Arg<bool>,
},
ListAppend {
i: Arg<u32>,
},
SetAdd {
i: Arg<u32>,
},
MapAdd {
i: Arg<u32>,
},
PrintExpr,
LoadBuildClass,
UnpackSequence {
size: Arg<u32>,
},
UnpackEx {
args: Arg<UnpackExArgs>,
},
FormatValue {
conversion: Arg<ConversionFlag>,
},
PopException,
Reverse {
amount: Arg<u32>,
},
GetAwaitable,
BeforeAsyncWith,
SetupAsyncWith {
end: Arg<Label>,
},
GetAIter,
GetANext,
EndAsyncFor,
MatchMapping,
MatchSequence,
MatchKeys,
MatchClass(Arg<u32>),
ExtendedArg,
// If you add a new instruction here, be sure to keep LAST_INSTRUCTION updated
}

impl Instruction {
/// Gets the label stored inside this instruction, if it exists
#[inline]
pub const fn label_arg(&self) -> Option<Arg<Label>> {
match self {
Jump { target: l }
| PopJumpIfTrue { target: l }
| PopJumpIfFalse { target: l }
| JumpIfTrueOrPop { target: l }
| JumpIfFalseOrPop { target: l }
| ForIter { target: l }
| SetupFinally { handler: l }
| SetupExcept { handler: l }
| SetupWith { end: l }
| SetupAsyncWith { end: l }
| Break { target: l }
| Continue { target: l } => Some(*l),
_ => None,
}
}
/// Whether this is an unconditional branching
///
/// # Examples
///
/// ```
/// use rustpython_compiler_core::bytecode::{Arg, Instruction};
/// let jump_inst = Instruction::Jump { target: Arg::marker() };
/// assert!(jump_inst.unconditional_branch())
/// ```
pub const fn unconditional_branch(&self) -> bool {
matches!(
self,
Jump { .. }
| Continue { .. }
| Break { .. }
| ReturnValue
| ReturnConst { .. }
| Raise { .. }
)
}
/// What effect this instruction has on the stack
///
/// # Examples
///
/// ```
/// use rustpython_compiler_core::bytecode::{Arg, Instruction, Label, UnaryOperator};
/// let (target, jump_arg) = Arg::new(Label(0xF));
/// let jump_instruction = Instruction::Jump { target };
/// let (op, invert_arg) = Arg::new(UnaryOperator::Invert);
/// let invert_instruction = Instruction::UnaryOperation { op };
/// assert_eq!(jump_instruction.stack_effect(jump_arg, true), 0);
/// assert_eq!(invert_instruction.stack_effect(invert_arg, false), 0);
/// ```
///
pub fn stack_effect(&self, arg: OpArg, jump: bool) -> i32 {
match self {
Nop => 0,
ImportName { .. } | ImportNameless => -1,
ImportFrom { .. } => 1,
LoadFast(_) | LoadNameAny(_) | LoadGlobal(_) | LoadDeref(_) | LoadClassDeref(_) => 1,
StoreFast(_) | StoreLocal(_) | StoreGlobal(_) | StoreDeref(_) => -1,
DeleteFast(_) | DeleteLocal(_) | DeleteGlobal(_) | DeleteDeref(_) => 0,
LoadClosure(_) => 1,
Subscript => -1,
StoreSubscript => -3,
DeleteSubscript => -2,
LoadAttr { .. } => 0,
StoreAttr { .. } => -2,
DeleteAttr { .. } => -1,
LoadConst { .. } => 1,
UnaryOperation { .. } => 0,
BinaryOperation { .. }
| BinaryOperationInplace { .. }
| TestOperation { .. }
| CompareOperation { .. } => -1,
BinarySubscript => -1,
CopyItem { .. } => 1,
Pop => -1,
Swap { .. } => 0,
ToBool => 0,
Rotate2 | Rotate3 => 0,
Duplicate => 1,
Duplicate2 => 2,
GetIter => 0,
GetLen => 1,
CallIntrinsic1 { .. } => 0, // Takes 1, pushes 1
CallIntrinsic2 { .. } => -1, // Takes 2, pushes 1
Continue { .. } => 0,
Break { .. } => 0,
Jump { .. } => 0,
PopJumpIfTrue { .. } | PopJumpIfFalse { .. } => -1,
JumpIfTrueOrPop { .. } | JumpIfFalseOrPop { .. } => {
if jump {
0
} else {
-1
}
}
MakeFunction => {
// CPython 3.13 style: MakeFunction only pops code object
-1 + 1 // pop code, push function
}
SetFunctionAttribute { .. } => {
// pops attribute value and function, pushes function back
-2 + 1
}
CallFunctionPositional { nargs } => -(nargs.get(arg) as i32) - 1 + 1,
CallMethodPositional { nargs } => -(nargs.get(arg) as i32) - 3 + 1,
CallFunctionKeyword { nargs } => -1 - (nargs.get(arg) as i32) - 1 + 1,
CallMethodKeyword { nargs } => -1 - (nargs.get(arg) as i32) - 3 + 1,
CallFunctionEx { has_kwargs } => -1 - (has_kwargs.get(arg) as i32) - 1 + 1,
CallMethodEx { has_kwargs } => -1 - (has_kwargs.get(arg) as i32) - 3 + 1,
LoadMethod { .. } => -1 + 3,
ForIter { .. } => {
if jump {
-1
} else {
1
}
}
ReturnValue => -1,
ReturnConst { .. } => 0,
Resume { .. } => 0,
YieldValue => 0,
YieldFrom => -1,
SetupAnnotation | SetupLoop | SetupFinally { .. } | EnterFinally | EndFinally => 0,
SetupExcept { .. } => jump as i32,
SetupWith { .. } => (!jump) as i32,
WithCleanupStart => 0,
WithCleanupFinish => -1,
PopBlock => 0,
Raise { kind } => -(kind.get(arg) as u8 as i32),
BuildString { size }
| BuildTuple { size, .. }
| BuildTupleFromTuples { size, .. }
| BuildList { size, .. }
| BuildListFromTuples { size, .. }
| BuildSet { size, .. }
| BuildSetFromTuples { size, .. } => -(size.get(arg) as i32) + 1,
BuildTupleFromIter => 0,
BuildMap { size } => {
let nargs = size.get(arg) * 2;
-(nargs as i32) + 1
}
BuildMapForCall { size } => {
let nargs = size.get(arg);
-(nargs as i32) + 1
}
DictUpdate { .. } => -1,
BuildSlice { step } => -2 - (step.get(arg) as i32) + 1,
ListAppend { .. } | SetAdd { .. } => -1,
MapAdd { .. } => -2,
PrintExpr => -1,
LoadBuildClass => 1,
UnpackSequence { size } => -1 + size.get(arg) as i32,
UnpackEx { args } => {
let UnpackExArgs { before, after } = args.get(arg);
-1 + before as i32 + 1 + after as i32
}
FormatValue { .. } => -1,
PopException => 0,
Reverse { .. } => 0,
GetAwaitable => 0,
BeforeAsyncWith => 1,
SetupAsyncWith { .. } => {
if jump {
-1
} else {
0
}
}
GetAIter => 0,
GetANext => 1,
EndAsyncFor => -2,
MatchMapping | MatchSequence => 1, // Push bool result
MatchKeys => 1, // Pop 2 (subject, keys), push 3 (subject, keys_or_none, values_or_none)
MatchClass(_) => -2,
ExtendedArg => 0,
}
}
pub fn display<'a>(
&'a self,
arg: OpArg,
ctx: &'a impl InstrDisplayContext,
) -> impl fmt::Display + 'a {
struct FmtFn<F>(F);
impl<F: Fn(&mut fmt::Formatter<'_>) -> fmt::Result> fmt::Display for FmtFn<F> {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
(self.0)(f)
}
}
FmtFn(move |f: &mut fmt::Formatter<'_>| self.fmt_dis(arg, f, ctx, false, 0, 0))
}
#[allow(clippy::too_many_arguments)]
fn fmt_dis(
&self,
arg: OpArg,
f: &mut fmt::Formatter<'_>,
ctx: &impl InstrDisplayContext,
expand_code_objects: bool,
pad: usize,
level: usize,
) -> fmt::Result {
macro_rules! w {
($variant:ident) => {
write!(f, stringify!($variant))
};
($variant:ident, $map:ident = $arg_marker:expr) => {{
let arg = $arg_marker.get(arg);
write!(f, "{:pad$}({}, {})", stringify!($variant), arg, $map(arg))
}};
($variant:ident, $arg_marker:expr) => {
write!(f, "{:pad$}({})", stringify!($variant), $arg_marker.get(arg))
};
($variant:ident, ?$arg_marker:expr) => {
write!(
f,
"{:pad$}({:?})",
stringify!($variant),
$arg_marker.get(arg)
)
};
}
let varname = |i: u32| ctx.get_varname(i as usize);
let name = |i: u32| ctx.get_name(i as usize);
let cell_name = |i: u32| ctx.get_cell_name(i as usize);
let fmt_const =
|op: &str, arg: OpArg, f: &mut fmt::Formatter<'_>, idx: &Arg<u32>| -> fmt::Result {
let value = ctx.get_constant(idx.get(arg) as usize);
match value.borrow_constant() {
BorrowedConstant::Code { code } if expand_code_objects => {
write!(f, "{op:pad$}({code:?}):")?;
code.display_inner(f, true, level + 1)?;
Ok(())
}
c => {
write!(f, "{op:pad$}(")?;
c.fmt_display(f)?;
write!(f, ")")
}
}
};
match self {
Nop => w!(Nop),
ImportName { idx } => w!(ImportName, name = idx),
ImportNameless => w!(ImportNameless),
ImportFrom { idx } => w!(ImportFrom, name = idx),
LoadFast(idx) => w!(LoadFast, varname = idx),
LoadNameAny(idx) => w!(LoadNameAny, name = idx),
LoadGlobal(idx) => w!(LoadGlobal, name = idx),
LoadDeref(idx) => w!(LoadDeref, cell_name = idx),
LoadClassDeref(idx) => w!(LoadClassDeref, cell_name = idx),
StoreFast(idx) => w!(StoreFast, varname = idx),
StoreLocal(idx) => w!(StoreLocal, name = idx),
StoreGlobal(idx) => w!(StoreGlobal, name = idx),
StoreDeref(idx) => w!(StoreDeref, cell_name = idx),
DeleteFast(idx) => w!(DeleteFast, varname = idx),
DeleteLocal(idx) => w!(DeleteLocal, name = idx),
DeleteGlobal(idx) => w!(DeleteGlobal, name = idx),
DeleteDeref(idx) => w!(DeleteDeref, cell_name = idx),
LoadClosure(i) => w!(LoadClosure, cell_name = i),
Subscript => w!(Subscript),
StoreSubscript => w!(StoreSubscript),
DeleteSubscript => w!(DeleteSubscript),
StoreAttr { idx } => w!(StoreAttr, name = idx),
DeleteAttr { idx } => w!(DeleteAttr, name = idx),
LoadConst { idx } => fmt_const("LoadConst", arg, f, idx),
UnaryOperation { op } => w!(UnaryOperation, ?op),
BinaryOperation { op } => w!(BinaryOperation, ?op),
BinaryOperationInplace { op } => w!(BinaryOperationInplace, ?op),
BinarySubscript => w!(BinarySubscript),
LoadAttr { idx } => w!(LoadAttr, name = idx),
TestOperation { op } => w!(TestOperation, ?op),
CompareOperation { op } => w!(CompareOperation, ?op),
CopyItem { index } => w!(CopyItem, index),
Pop => w!(Pop),
Swap { index } => w!(Swap, index),
ToBool => w!(ToBool),
Rotate2 => w!(Rotate2),
Rotate3 => w!(Rotate3),
Duplicate => w!(Duplicate),
Duplicate2 => w!(Duplicate2),
GetIter => w!(GetIter),
// GET_LEN
GetLen => w!(GetLen),
CallIntrinsic1 { func } => w!(CallIntrinsic1, ?func),
CallIntrinsic2 { func } => w!(CallIntrinsic2, ?func),
Continue { target } => w!(Continue, target),
Break { target } => w!(Break, target),
Jump { target } => w!(Jump, target),
PopJumpIfTrue { target } => w!(PopJumpIfTrue, target),
PopJumpIfFalse { target } => w!(PopJumpIfFalse, target),
JumpIfTrueOrPop { target } => w!(JumpIfTrueOrPop, target),
JumpIfFalseOrPop { target } => w!(JumpIfFalseOrPop, target),
MakeFunction => w!(MakeFunction),
SetFunctionAttribute { attr } => w!(SetFunctionAttribute, ?attr),
CallFunctionPositional { nargs } => w!(CallFunctionPositional, nargs),
CallFunctionKeyword { nargs } => w!(CallFunctionKeyword, nargs),
CallFunctionEx { has_kwargs } => w!(CallFunctionEx, has_kwargs),
LoadMethod { idx } => w!(LoadMethod, name = idx),
CallMethodPositional { nargs } => w!(CallMethodPositional, nargs),
CallMethodKeyword { nargs } => w!(CallMethodKeyword, nargs),
CallMethodEx { has_kwargs } => w!(CallMethodEx, has_kwargs),
ForIter { target } => w!(ForIter, target),
ReturnValue => w!(ReturnValue),
ReturnConst { idx } => fmt_const("ReturnConst", arg, f, idx),
Resume { arg } => w!(Resume, arg),
YieldValue => w!(YieldValue),
YieldFrom => w!(YieldFrom),
SetupAnnotation => w!(SetupAnnotation),
SetupLoop => w!(SetupLoop),
SetupExcept { handler } => w!(SetupExcept, handler),
SetupFinally { handler } => w!(SetupFinally, handler),
EnterFinally => w!(EnterFinally),
EndFinally => w!(EndFinally),
SetupWith { end } => w!(SetupWith, end),
WithCleanupStart => w!(WithCleanupStart),
WithCleanupFinish => w!(WithCleanupFinish),
BeforeAsyncWith => w!(BeforeAsyncWith),
SetupAsyncWith { end } => w!(SetupAsyncWith, end),
PopBlock => w!(PopBlock),
Raise { kind } => w!(Raise, ?kind),
BuildString { size } => w!(BuildString, size),
BuildTuple { size } => w!(BuildTuple, size),
BuildTupleFromTuples { size } => w!(BuildTupleFromTuples, size),
BuildTupleFromIter => w!(BuildTupleFromIter),
BuildList { size } => w!(BuildList, size),
BuildListFromTuples { size } => w!(BuildListFromTuples, size),
BuildSet { size } => w!(BuildSet, size),
BuildSetFromTuples { size } => w!(BuildSetFromTuples, size),
BuildMap { size } => w!(BuildMap, size),
BuildMapForCall { size } => w!(BuildMapForCall, size),
DictUpdate { index } => w!(DictUpdate, index),
BuildSlice { step } => w!(BuildSlice, step),
ListAppend { i } => w!(ListAppend, i),
SetAdd { i } => w!(SetAdd, i),
MapAdd { i } => w!(MapAdd, i),
PrintExpr => w!(PrintExpr),
LoadBuildClass => w!(LoadBuildClass),
UnpackSequence { size } => w!(UnpackSequence, size),
UnpackEx { args } => w!(UnpackEx, args),
FormatValue { conversion } => w!(FormatValue, ?conversion),
PopException => w!(PopException),
Reverse { amount } => w!(Reverse, amount),
GetAwaitable => w!(GetAwaitable),
GetAIter => w!(GetAIter),
GetANext => w!(GetANext),
EndAsyncFor => w!(EndAsyncFor),
MatchMapping => w!(MatchMapping),
MatchSequence => w!(MatchSequence),
MatchKeys => w!(MatchKeys),
MatchClass(arg) => w!(MatchClass, arg),
ExtendedArg => w!(ExtendedArg, Arg::<u32>::marker()),
}
}
}

But turned out to be more work than originally anticipated.
From what I see ATM our largest gaps are that we have opcodes that doesn't exist in Cpython (3.13.7) such as

Continue {
target: Arg<Label>,
},
Break {
target: Arg<Label>,
},

So, I have converted this PR to address #6156 (review)


cc: @youknowone @coolreader18
Followup for #6156 (comment)

Summary by CodeRabbit

  • New Features

    • Public, strongly-typed opcode API with Real and Pseudo opcode types, unified conversions, and rich inspection utilities exposed throughout the runtime.
  • Bug Fixes

    • Improved opcode validation and more accurate stack-effect calculations; added special-case handling and adjusted a related configuration constant.
  • Chores

    • Added an opcode generation script, marked generated sources for tooling, updated spell dictionary, and re-exported opcode APIs for easier access.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 23, 2025

Walkthrough

Adds generated opcode enums (RealOpcode, PseudoOpcode) and an Opcode wrapper to compiler/core, a Python generator script to emit the generated file, stdlib and VM updates to use the new opcode types, marks the generated file in .gitattributes, adds dependencies, and updates a spelling dictionary token.

Changes

Cohort / File(s) Summary
Repo metadata
\.gitattributes
Marks compiler/core/src/opcodes.rs as linguist-generated.
Spelling / dictionary
\.cspell.dict/cpython.txt
Adds token deopt.
Core crate root
compiler/core/src/lib.rs
Adds pub mod opcode; and mod opcodes; and re-exports Opcode, PseudoOpcode, RealOpcode.
Opcode wrapper API
compiler/core/src/opcode.rs
New public enum Opcode { Real(RealOpcode), Pseudo(PseudoOpcode) } plus TryFrom impls for u8/u16 and many integer types; uses MarshalError and delegates pseudo→real conversion path.
Generated opcodes (big)
compiler/core/src/opcodes.rs
Generated file adding RealOpcode (repr(u8)) and PseudoOpcode (repr(u16)) enums, variants, validation (is_valid), predicates (has_*), num_popped/num_pushed/deopt, new_unchecked, and TryFrom impls for many integer types.
Opcode generator script
scripts/gen_opcodes.py
New generator that analyzes CPython opcode data and emits compiler/core/src/opcodes.rs (helpers: group_ranges, fmt_ranges, enum_variant_name; classes: InstructionsMeta, RealInstructions, PseudoInstructions; entry: main).
Stdlib integration
stdlib/src/opcode.rs
Reworked to use Opcode/RealOpcode/PseudoOpcode directly; stack_effect and predicate logic updated; ENABLE_SPECIALIZATION type changed from i8 to u8.
VM re-exports
vm/src/lib.rs
Adds opcode to public re-exports from rustpython_compiler_core (pub use ... {bytecode, frozen, opcode};).
Core Cargo config
compiler/core/Cargo.toml
Adds workspace dependencies num_enum and num_traits.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Caller
  participant Std as stdlib::opcode::stack_effect
  participant Core as compiler_core::opcode::Opcode
  participant Real as compiler_core::opcodes::RealOpcode
  participant Pseudo as compiler_core::opcodes::PseudoOpcode

  Caller->>Std: stack_effect(raw_id, oparg)
  Std->>Core: Opcode::try_from(raw_id)
  alt Pseudo (u16)
    Core->>Pseudo: PseudoOpcode::try_from(raw)
    Pseudo-->>Std: PseudoOpcode
    Std->>Std: apply pseudo-specific stack logic
    Std-->>Caller: stack delta
  else Real (u8)
    Core->>Real: RealOpcode::try_from(raw_u8)
    Real-->>Std: RealOpcode
    Std->>Std: use RealOpcode::num_popped/num_pushed/deopt
    Std-->>Caller: stack delta
  else Invalid
    Std-->>Caller: invalid / error
  end
Loading
sequenceDiagram
  autonumber
  actor Dev
  participant Script as scripts/gen_opcodes.py
  participant Output as compiler/core/src/opcodes.rs
  participant Repo as Git

  Dev->>Script: run generator (analyze CPython)
  Script->>Output: emit enums, predicates, TryFrom impls, helpers
  Script->>Repo: write file & run cargo fmt
  Repo-->>Output: file added (marked in .gitattributes)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • youknowone

Poem

I hop through enums, nibble each name,
Turn raw bytes to opcodes, tidy and tame.
I sniff predicates, push, pop, and deopt,
Generated carrots in a neat little crop.
Hooray — compile-time crumbs make my whiskers hop! 🥕🐇

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title accurately highlights the main change, which is the automatic generation of opcode metadata, without extraneous detail.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3daec5a and 9436354.

📒 Files selected for processing (1)
  • compiler/core/src/opcode.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • compiler/core/src/opcode.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Check Rust code with rustfmt and clippy
  • GitHub Check: Ensure compilation on various targets

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ShaharNaveh ShaharNaveh changed the title POC: Instruction auto generation POC: Opcode metadata auto generation Oct 5, 2025
@ShaharNaveh
Copy link
Collaborator Author

I have updated the main PR comment

cc: @youknowone @coolreader18

@ShaharNaveh ShaharNaveh marked this pull request as ready for review October 5, 2025 11:38
@ShaharNaveh ShaharNaveh changed the title POC: Opcode metadata auto generation Opcode metadata auto generation Oct 5, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
compiler/core/src/opcode.rs (3)

7-8: Consider adding accessor and additional trait implementations.

The OpcodeId newtype lacks a public accessor for the inner u16 value. Additionally, consider implementing Hash, Display, and potentially FromStr for better ergonomics.

Apply this diff to add an accessor:

 #[derive(Clone, Copy, Debug, Eq, PartialEq)]
 pub struct OpcodeId(u16);

 impl OpcodeId {
+    /// Returns the raw opcode ID value.
+    #[must_use]
+    pub const fn get(self) -> u16 {
+        self.0
+    }
+

For Hash, update the derives:

-#[derive(Clone, Copy, Debug, Eq, PartialEq)]
+#[derive(Clone, Copy, Debug, Eq, PartialEq, Hash)]
 pub struct OpcodeId(u16);

235-243: Enhance safety documentation.

The safety documentation for new_unchecked should reference the is_valid method to clarify what constitutes a valid opcode ID.

Apply this diff:

     /// Creates a new opcode ID without checking the value is a valid opcode ID.
     ///
     /// # Safety
     ///
-    /// The value should be a valid opcode ID.
+    /// The caller must ensure that `id` satisfies `Self::is_valid(id)`.
     #[must_use]
     pub const unsafe fn new_unchecked(id: u16) -> Self {

545-560: Consider a more descriptive error type for TryFrom.

The TryFrom implementations use () as the error type, providing no information about why a conversion failed. While this is acceptable, a custom error type could improve debuggability.

Consider defining an error type:

/// Error type for invalid opcode ID conversions.
#[derive(Clone, Copy, Debug, Eq, PartialEq)]
pub struct InvalidOpcodeId;

impl std::fmt::Display for InvalidOpcodeId {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        write!(f, "invalid opcode ID")
    }
}

impl std::error::Error for InvalidOpcodeId {}

Then update the macro:

 macro_rules! opcode_id_try_from_impl {
     ($t:ty) => {
         impl TryFrom<$t> for OpcodeId {
-            type Error = ();
+            type Error = InvalidOpcodeId;

             fn try_from(value: $t) -> Result<Self, Self::Error> {
-                let id = value.try_into().map_err(|_| ())?;
+                let id = value.try_into().map_err(|_| InvalidOpcodeId)?;
                 if Self::is_valid(id) {
                     Ok(Self(id))
                 } else {
-                    Err(())
+                    Err(InvalidOpcodeId)
                 }
             }
         }
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3a6fda4 and db2775a.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (6)
  • .gitattributes (1 hunks)
  • compiler/core/src/lib.rs (1 hunks)
  • compiler/core/src/opcode.rs (1 hunks)
  • scripts/gen_opcodes.py (1 hunks)
  • stdlib/src/opcode.rs (3 hunks)
  • vm/src/lib.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Format Rust code with the default rustfmt style (run cargo fmt)
Run clippy and fix any warnings or lints introduced by your changes
Follow Rust best practices for error handling and memory management

Files:

  • vm/src/lib.rs
  • compiler/core/src/lib.rs
  • compiler/core/src/opcode.rs
  • stdlib/src/opcode.rs
{vm,stdlib}/**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use RustPython macros (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • vm/src/lib.rs
  • stdlib/src/opcode.rs
🧬 Code graph analysis (3)
scripts/gen_opcodes.py (2)
compiler/core/src/opcode.rs (9)
  • is_pseudo (253-269)
  • is_valid (247-249)
  • has_arg (273-426)
  • has_const (430-435)
  • has_name (439-465)
  • has_jump (469-487)
  • has_free (491-500)
  • has_local (504-519)
  • has_exc (523-542)
stdlib/src/opcode.rs (8)
  • is_valid (70-72)
  • has_arg (75-77)
  • has_const (80-82)
  • has_name (85-87)
  • has_jump (90-92)
  • has_free (95-97)
  • has_local (100-102)
  • has_exc (105-107)
compiler/core/src/opcode.rs (1)
stdlib/src/opcode.rs (8)
  • is_valid (70-72)
  • has_arg (75-77)
  • has_const (80-82)
  • has_name (85-87)
  • has_jump (90-92)
  • has_free (95-97)
  • has_local (100-102)
  • has_exc (105-107)
stdlib/src/opcode.rs (3)
scripts/gen_opcodes.py (1)
  • Instruction (81-119)
compiler/core/src/bytecode.rs (1)
  • try_from (814-820)
compiler/core/src/opcode.rs (8)
  • is_valid (247-249)
  • has_arg (273-426)
  • has_const (430-435)
  • has_name (439-465)
  • has_jump (469-487)
  • has_free (491-500)
  • has_local (504-519)
  • has_exc (523-542)
🪛 Flake8 (7.3.0)
scripts/gen_opcodes.py

[error] 256-256: indentation contains mixed spaces and tabs

(E101)


[error] 257-257: indentation contains mixed spaces and tabs

(E101)


[error] 258-258: indentation contains mixed spaces and tabs

(E101)


[error] 259-259: indentation contains mixed spaces and tabs

(E101)


[error] 260-260: indentation contains mixed spaces and tabs

(E101)


[error] 261-261: indentation contains mixed spaces and tabs

(E101)


[error] 262-262: indentation contains mixed spaces and tabs

(E101)


[error] 263-263: indentation contains mixed spaces and tabs

(E101)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run snippets and cpython tests (windows-2025)
🔇 Additional comments (3)
scripts/gen_opcodes.py (2)

75-78: ...


75-78: ...

compiler/core/src/opcode.rs (1)

11-233: Refine opcode verification scope
Adaptive and instrumented opcodes (e.g. CALL_ALLOC_AND_ENTER_INIT, INSTRUMENTED_*, etc.) are RustPython-specific and must not be removed; only verify that the standard CPython 3.13.7 opcodes (those without ADAPTIVE_ or INSTRUMENTED_ prefixes) match their CPython values.

Likely an incorrect or invalid review comment.

Comment on lines 521 to 523
/// Whether opcode ID have 'HAS_PURE_FLAG' set.
#[must_use]
pub const fn has_exc(&self) -> bool {
Copy link
Contributor

@coderabbitai coderabbitai bot Oct 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix misleading documentation comment.

The documentation comment says "Whether opcode ID have 'HAS_PURE_FLAG' set" but the method is named has_exc. This appears to be a copy-paste error.

Apply this diff to fix the comment:

-    /// Whether opcode ID have 'HAS_PURE_FLAG' set.
+    /// Whether opcode ID have 'HAS_EXC_FLAG' set.
     #[must_use]
     pub const fn has_exc(&self) -> bool {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// Whether opcode ID have 'HAS_PURE_FLAG' set.
#[must_use]
pub const fn has_exc(&self) -> bool {
/// Whether opcode ID have 'HAS_EXC_FLAG' set.
#[must_use]
pub const fn has_exc(&self) -> bool {
🤖 Prompt for AI Agents
In compiler/core/src/opcode.rs around lines 521 to 523, the doc comment above
pub const fn has_exc(&self) -> bool is incorrect and looks copy-pasted from
another method; update the comment to correctly describe this method (e.g.,
"Whether opcode ID has 'HAS_EXC_FLAG' set.") and fix grammar ("has" instead of
"have") so the comment matches the method name and purpose.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is intended

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the generation script to see how has_exc method is generated
cat scripts/gen_opcodes.py | grep -A 10 -B 5 "has_exc\|HAS_PURE"

Length of output: 1411


🏁 Script executed:

#!/bin/bash
# Search for any references to HAS_PURE_FLAG or has_exc in the codebase
rg -n "HAS_PURE|has_exc" --type=py --type=rust -C 3

Length of output: 5081


@ShaharNaveh, understood! I can see from the generation script that this is intentional:

def generate_has_exc(self) -> str:
    return self._generate_has_attr("exc", flag_override="HAS_PURE_FLAG")

The method name has_exc() reflects the semantic meaning (opcodes related to exception handling), while the doc comment accurately documents which CPython flag is actually being checked (HAS_PURE_FLAG). This design choice maintains alignment with CPython's internal flag naming while providing a more descriptive API method name.

Thank you for the clarification!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
stdlib/src/opcode.rs (1)

26-26: Track the TODO for auto-generated Instruction enum.

This TODO references future work to leverage the auto-generated Instruction enum. Consider tracking this as a follow-up task.

Do you want me to open a new issue to track this task?

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between db2775a and 7fa003b.

📒 Files selected for processing (1)
  • stdlib/src/opcode.rs (3 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Format Rust code with the default rustfmt style (run cargo fmt)
Run clippy and fix any warnings or lints introduced by your changes
Follow Rust best practices for error handling and memory management

Files:

  • stdlib/src/opcode.rs
{vm,stdlib}/**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use RustPython macros (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • stdlib/src/opcode.rs
🧬 Code graph analysis (1)
stdlib/src/opcode.rs (3)
scripts/gen_opcodes.py (1)
  • Instruction (81-119)
compiler/core/src/bytecode.rs (1)
  • try_from (814-820)
compiler/core/src/opcode.rs (8)
  • is_valid (247-249)
  • has_arg (273-426)
  • has_const (430-435)
  • has_name (439-465)
  • has_jump (469-487)
  • has_free (491-500)
  • has_local (504-519)
  • has_exc (523-542)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Check Rust code with rustfmt and clippy
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests (windows-2025)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run rust tests (windows-2025)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Check the WASM package and demo
🔇 Additional comments (3)
stdlib/src/opcode.rs (3)

57-59: LGTM! TypeError preservation implemented correctly.

The error handling now correctly preserves TypeError when the opcode argument is not an integer, while only converting Instruction::try_from failures to ValueError. This directly addresses the feedback from the previous review.


14-14: LGTM! Correct type for the flag.

Changing ENABLE_SPECIALIZATION from i8 to u8 is appropriate for a boolean-like flag constant.


71-101: LGTM! Consistent pattern across all predicates.

The has_* functions follow a consistent pattern using OpcodeId::try_from(opcode).is_ok_and(|oid| !oid.is_pseudo() && oid.has_*()), which correctly filters out pseudo-opcodes and delegates to the generated predicates.

#[pyfunction]
fn is_valid(opcode: i32) -> bool {
Opcode::is_valid(opcode)
OpcodeId::try_from(opcode).is_ok()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Consider filtering pseudo-opcodes in is_valid for consistency.

All the has_* predicates (lines 71-101) explicitly filter out pseudo-opcodes with !oid.is_pseudo(), but is_valid does not. Pseudo-opcodes are typically not valid for runtime bytecode execution. For consistency with the other predicates and to match typical CPython semantics, is_valid should likely also exclude pseudo-opcodes.

Apply this diff to filter pseudo-opcodes:

 fn is_valid(opcode: i32) -> bool {
-    OpcodeId::try_from(opcode).is_ok()
+    OpcodeId::try_from(opcode).is_ok_and(|oid| !oid.is_pseudo())
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
OpcodeId::try_from(opcode).is_ok()
fn is_valid(opcode: i32) -> bool {
OpcodeId::try_from(opcode).is_ok_and(|oid| !oid.is_pseudo())
}
🤖 Prompt for AI Agents
In stdlib/src/opcode.rs around line 66, update the is_valid implementation to
exclude pseudo-opcodes like the other has_* predicates: after converting opcode
to an OpcodeId via try_from, return true only if the conversion succeeds and the
resulting OpcodeId is not a pseudo-opcode (i.e., check !oid.is_pseudo()); ensure
you handle the Err case from try_from by returning false.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
stdlib/src/opcode.rs (1)

31-41: Preserve the CPython TypeError message.

Line 39 wraps PyInt::try_to_primitive::<i32> in an empty TypeError(""); CPython raises TypeError("'int' object cannot be interpreted as an integer"). Emitting an empty message is misleading for users. Please propagate the original conversion error (or at least set a meaningful message matching CPython).

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7fa003b and 742b709.

⛔ Files ignored due to path filters (3)
  • Cargo.lock is excluded by !**/*.lock
  • Lib/dis.py is excluded by !Lib/**
  • Lib/test/test__opcode.py is excluded by !Lib/**
📒 Files selected for processing (8)
  • .cspell.dict/cpython.txt (1 hunks)
  • .gitattributes (1 hunks)
  • compiler/core/src/lib.rs (1 hunks)
  • compiler/core/src/opcode.rs (1 hunks)
  • compiler/core/src/opcodes.rs (1 hunks)
  • scripts/gen_opcodes.py (1 hunks)
  • stdlib/src/opcode.rs (3 hunks)
  • vm/src/lib.rs (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • .cspell.dict/cpython.txt
🚧 Files skipped from review as they are similar to previous changes (3)
  • compiler/core/src/lib.rs
  • .gitattributes
  • vm/src/lib.rs
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Format Rust code with the default rustfmt style (run cargo fmt)
Run clippy and fix any warnings or lints introduced by your changes
Follow Rust best practices for error handling and memory management

Files:

  • stdlib/src/opcode.rs
  • compiler/core/src/opcodes.rs
  • compiler/core/src/opcode.rs
{vm,stdlib}/**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use RustPython macros (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • stdlib/src/opcode.rs
🧬 Code graph analysis (4)
stdlib/src/opcode.rs (2)
compiler/core/src/bytecode.rs (1)
  • stack_effect (1391-1516)
compiler/core/src/opcodes.rs (32)
  • try_from (993-1001)
  • try_from (1007-1015)
  • try_from (1021-1029)
  • try_from (1035-1043)
  • try_from (1049-1057)
  • try_from (1063-1071)
  • try_from (1077-1084)
  • try_from (1090-1098)
  • try_from (1104-1112)
  • try_from (1118-1126)
  • try_from (1132-1140)
  • try_from (1146-1154)
  • try_from (1250-1258)
  • try_from (1264-1272)
  • try_from (1278-1286)
  • try_from (1292-1300)
  • is_valid (549-551)
  • is_valid (1242-1244)
  • has_arg (299-441)
  • has_arg (1177-1189)
  • has_const (445-450)
  • has_const (1193-1195)
  • has_name (523-545)
  • has_name (1230-1238)
  • has_jump (486-502)
  • has_jump (1218-1220)
  • has_free (473-482)
  • has_free (1212-1214)
  • has_local (506-519)
  • has_local (1224-1226)
  • has_exc (454-469)
  • has_exc (1199-1208)
compiler/core/src/opcodes.rs (1)
stdlib/src/opcode.rs (8)
  • has_arg (120-122)
  • has_const (125-127)
  • has_exc (150-152)
  • has_free (140-142)
  • has_jump (135-137)
  • has_local (145-147)
  • has_name (130-132)
  • is_valid (115-117)
compiler/core/src/opcode.rs (1)
compiler/core/src/opcodes.rs (16)
  • try_from (993-1001)
  • try_from (1007-1015)
  • try_from (1021-1029)
  • try_from (1035-1043)
  • try_from (1049-1057)
  • try_from (1063-1071)
  • try_from (1077-1084)
  • try_from (1090-1098)
  • try_from (1104-1112)
  • try_from (1118-1126)
  • try_from (1132-1140)
  • try_from (1146-1154)
  • try_from (1250-1258)
  • try_from (1264-1272)
  • try_from (1278-1286)
  • try_from (1292-1300)
scripts/gen_opcodes.py (1)
compiler/core/src/opcodes.rs (1)
  • deopt (224-295)
🪛 Flake8 (7.3.0)
scripts/gen_opcodes.py

[error] 24-24: 'opcode_metadata_generator.cflags' imported but unused

(F401)


[error] 329-329: do not assign a lambda expression, use a def

(E731)


[error] 329-329: ambiguous variable name 'l'

(E741)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Run snippets and cpython tests (windows-2025)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Check Rust code with rustfmt and clippy
  • GitHub Check: Run rust tests (windows-2025)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
🔇 Additional comments (5)
stdlib/src/opcode.rs (1)

52-110: Opcode::try_from accepts pseudocode IDs—mirrors CPython’s ValueError.

Great job aligning error handling and stack effect logic with the new opcode split. Using Opcode::try_from here cleanly differentiates real and pseudo opcodes and keeps CPython-compatible validation.

compiler/core/src/opcode.rs (2)

9-27: impl_try_from! drops pseudo-only values silently.

The macro tries RealOpcode::try_from first; if that succeeds we never see pseudos. But RealOpcode::try_from returns Err(()) for pseudo IDs, so the fallback works—no issue. Confirmed.


9-27: Macro-generated TryFrom keeps conversions DRY.

Nice reuse of the macro to wire every integer type into Opcode; keeps parity with CPython-style APIs with minimal duplication.

compiler/core/src/opcodes.rs (2)

452-469: Confirm flag mapping for has_exc.

Using CPython’s HAS_PURE flag to back RustPython’s has_exc is intentional per prior discussion—thanks for preserving the semantics.


547-551: is_valid excludes pseudo-opcodes.

The real-opcode is_valid stops at 0..=118 | 149..=222 | 236..=254, so pseudos are properly filtered here; consistent with stdlib predicates once patched.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
scripts/gen_opcodes.py (3)

24-24: Drop the unused cflags import.

The import triggers flake8 F401 and is never referenced in the script.

Apply this diff:

-from opcode_metadata_generator import cflags

53-54: Replace lambda assignments with proper methods.

Assigning lambdas to class attributes violates PEP 8 E731 and reduces maintainability. These monkey-patch StackOffset methods used by CPython's stack analyzer.

Apply this diff to define proper functions:

-StackOffset.pop = lambda self, item: self.popped.append(_var_size(item))
-StackOffset.push = lambda self, item: self.pushed.append(_var_size(item))
+def _stack_offset_pop(self, item):
+    self.popped.append(_var_size(item))
+
+def _stack_offset_push(self, item):
+    self.pushed.append(_var_size(item))
+
+StackOffset.pop = _stack_offset_pop
+StackOffset.push = _stack_offset_push

344-344: Replace lambda and rename the ambiguous parameter.

The lambda assigned to f violates flake8 E731, and its parameter l is ambiguous (E741). Use a descriptive name like deopt_list and define a proper function.

Apply this diff:

-        f = lambda l: "|".join(f"Self::{x}" for x in l)
+        def format_deopt_variants(deopt_list):
+            return "|".join(f"Self::{x}" for x in deopt_list)
+        
         branches = ",\n".join(
-            f"{f(deopt)} => Self::{name}" for name, deopt in sorted(deopts.items())
+            f"{format_deopt_variants(deopt)} => Self::{name}" for name, deopt in sorted(deopts.items())
         )
🧹 Nitpick comments (1)
scripts/gen_opcodes.py (1)

234-279: Document the TryFrom generation logic.

The impl_try_from_nums property contains complex branching logic (lines 245-256) that determines whether narrowing/widening conversions require .try_into(), .into(), or direct use. A brief comment explaining the four cases (sign mismatch, widen, exact match, narrow) would aid future maintainers.

Consider adding a docstring or inline comments like:

def fn_prefix(typ: str):
    """
    Generate the TryFrom prefix based on type compatibility.
    
    Cases:
    1. Sign mismatch (u8 vs i8): requires try_into
    2. Widening (u8 to u16): requires try_into for safety
    3. Exact match (u8 to u8): use directly
    4. Narrowing (u16 to u8): use into (infallible)
    """
    s_val = int(self.typ[1:])
    # ... rest of implementation
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 742b709 and 2a8fb1e.

📒 Files selected for processing (2)
  • compiler/core/src/opcodes.rs (1 hunks)
  • scripts/gen_opcodes.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • compiler/core/src/opcodes.rs
🧰 Additional context used
🧬 Code graph analysis (1)
scripts/gen_opcodes.py (1)
compiler/core/src/opcodes.rs (1)
  • deopt (224-295)
🪛 Flake8 (7.3.0)
scripts/gen_opcodes.py

[error] 24-24: 'opcode_metadata_generator.cflags' imported but unused

(F401)


[error] 344-344: do not assign a lambda expression, use a def

(E731)


[error] 344-344: ambiguous variable name 'l'

(E741)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Run snippets and cpython tests (windows-2025)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Check Rust code with rustfmt and clippy
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run rust tests (windows-2025)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Run rust tests (macos-latest)
🔇 Additional comments (2)
scripts/gen_opcodes.py (2)

295-321: No action needed: .to_c() produces valid Rust expressions


383-386: Regex substitution is safe; no unintended replacements detected
The existing compiler/core/src/opcodes.rs shows only intended enum variants (RealOpcode, PseudoOpcode, etc.) in comments and code, and all Rust keywords (impl, fn, pub, const, match, Self) remain intact.

Copy link
Member

@youknowone youknowone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coolreader18 What will be the best way to solve it?

@ShaharNaveh Do we need to have different type for Instruction and RealOpCode? Otherwise, just in case, can it be defined like:

struct RealOpCode(bytecode::Instruction);

Comment on lines 1001 to 1166
impl TryFrom<i8> for RealOpcode {
type Error = ();

fn try_from(raw: i8) -> Result<Self, Self::Error> {
let id = raw.try_into().map_err(|_| ())?;
if Self::is_valid(id) {
// SAFETY: We just validated that we have a valid opcode id.
Ok(unsafe { Self::new_unchecked(id) })
} else {
Err(())
}
}
}

impl TryFrom<i16> for RealOpcode {
type Error = ();

fn try_from(raw: i16) -> Result<Self, Self::Error> {
let id = raw.try_into().map_err(|_| ())?;
if Self::is_valid(id) {
// SAFETY: We just validated that we have a valid opcode id.
Ok(unsafe { Self::new_unchecked(id) })
} else {
Err(())
}
}
}

impl TryFrom<i32> for RealOpcode {
type Error = ();

fn try_from(raw: i32) -> Result<Self, Self::Error> {
let id = raw.try_into().map_err(|_| ())?;
if Self::is_valid(id) {
// SAFETY: We just validated that we have a valid opcode id.
Ok(unsafe { Self::new_unchecked(id) })
} else {
Err(())
}
}
}

impl TryFrom<i64> for RealOpcode {
type Error = ();

fn try_from(raw: i64) -> Result<Self, Self::Error> {
let id = raw.try_into().map_err(|_| ())?;
if Self::is_valid(id) {
// SAFETY: We just validated that we have a valid opcode id.
Ok(unsafe { Self::new_unchecked(id) })
} else {
Err(())
}
}
}

impl TryFrom<i128> for RealOpcode {
type Error = ();

fn try_from(raw: i128) -> Result<Self, Self::Error> {
let id = raw.try_into().map_err(|_| ())?;
if Self::is_valid(id) {
// SAFETY: We just validated that we have a valid opcode id.
Ok(unsafe { Self::new_unchecked(id) })
} else {
Err(())
}
}
}

impl TryFrom<isize> for RealOpcode {
type Error = ();

fn try_from(raw: isize) -> Result<Self, Self::Error> {
let id = raw.try_into().map_err(|_| ())?;
if Self::is_valid(id) {
// SAFETY: We just validated that we have a valid opcode id.
Ok(unsafe { Self::new_unchecked(id) })
} else {
Err(())
}
}
}

impl TryFrom<u8> for RealOpcode {
type Error = ();

fn try_from(id: u8) -> Result<Self, Self::Error> {
if Self::is_valid(id) {
// SAFETY: We just validated that we have a valid opcode id.
Ok(unsafe { Self::new_unchecked(id) })
} else {
Err(())
}
}
}

impl TryFrom<u16> for RealOpcode {
type Error = ();

fn try_from(raw: u16) -> Result<Self, Self::Error> {
let id = raw.try_into().map_err(|_| ())?;
if Self::is_valid(id) {
// SAFETY: We just validated that we have a valid opcode id.
Ok(unsafe { Self::new_unchecked(id) })
} else {
Err(())
}
}
}

impl TryFrom<u32> for RealOpcode {
type Error = ();

fn try_from(raw: u32) -> Result<Self, Self::Error> {
let id = raw.try_into().map_err(|_| ())?;
if Self::is_valid(id) {
// SAFETY: We just validated that we have a valid opcode id.
Ok(unsafe { Self::new_unchecked(id) })
} else {
Err(())
}
}
}

impl TryFrom<u64> for RealOpcode {
type Error = ();

fn try_from(raw: u64) -> Result<Self, Self::Error> {
let id = raw.try_into().map_err(|_| ())?;
if Self::is_valid(id) {
// SAFETY: We just validated that we have a valid opcode id.
Ok(unsafe { Self::new_unchecked(id) })
} else {
Err(())
}
}
}

impl TryFrom<u128> for RealOpcode {
type Error = ();

fn try_from(raw: u128) -> Result<Self, Self::Error> {
let id = raw.try_into().map_err(|_| ())?;
if Self::is_valid(id) {
// SAFETY: We just validated that we have a valid opcode id.
Ok(unsafe { Self::new_unchecked(id) })
} else {
Err(())
}
}
}

impl TryFrom<usize> for RealOpcode {
type Error = ();

fn try_from(raw: usize) -> Result<Self, Self::Error> {
let id = raw.try_into().map_err(|_| ())?;
if Self::is_valid(id) {
// SAFETY: We just validated that we have a valid opcode id.
Ok(unsafe { Self::new_unchecked(id) })
} else {
Err(())
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this applicable?

https://docs.rs/num_enum/latest/num_enum/derive.TryFromPrimitive.html

Also for the psuedo opcode

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is great!
will modify the code

@ShaharNaveh
Copy link
Collaborator Author

@ShaharNaveh Do we need to have different type for Instruction and RealOpCode? Otherwise, just in case, can it be defined like:

struct RealOpCode(bytecode::Instruction);

Yes, because opcode an instruction is comprised out of an opcode and a optional oparg.

imo opcode should be it's own enum.

An instruction should contain a RealOpcode + Option<T>.

where T can be:

  • Oparg (which is a newtype for u32)
  • Typed OpargType for certain opcodes (i.e. Resume opcode can only have values from 0-3, each represents different action to be taken by the VM).

For Instruction we have a Instruction::new constructor that looks like:

impl<T: OpargType> Instruction<T> {
    pub fn new(opcode: RealOpcode, raw_oparg: u32) -> Result<Self, MarshalError> {
        if !opcode.has_arg() {
           return Ok(Self {opcode, oparg: None });
        }
        let oparg = match opcode {
          RealOpcode::Resume => ResumeOpcode::try_from(raw_oparg)?,
          // More typed oparg checks for specific opcodes
         _ => Oparg(raw_oparg)
        };
        Ok(Self { opcode, oparg: Some(oparg) })
    }
}

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
stdlib/src/opcode.rs (1)

131-133: Filter pseudo-opcodes in is_valid to match CPython semantics.

Past review comments correctly noted that is_valid should reject pseudo-opcodes, as they are not valid for runtime bytecode execution and CPython's opcode.is_valid excludes them. Currently, Opcode::try_from(opcode).is_ok() returns true for pseudo-opcodes.

Apply this diff to filter pseudo-opcodes:

 #[pyfunction]
 fn is_valid(opcode: i32) -> bool {
-    Opcode::try_from(opcode).is_ok()
+    matches!(Opcode::try_from(opcode), Ok(Opcode::Real(_)))
 }
scripts/gen_opcodes.py (1)

53-78: Guard against empty input in group_ranges.

The function unconditionally accesses nums[0] on line 73, raising IndexError if the input iterable is empty. While CPython's opcode metadata likely never produces empty ranges in practice, defensive coding would prevent potential future issues.

Apply this diff to add an early return:

 def group_ranges(it: "Iterable[int]") -> "Iterator[range]":
     """
     Group consecutive numbers into ranges.
 
     Parameters
     ----------
     it : Iterable[int]
         Numbers to group into ranges.
 
     Notes
     -----
     Numbers in `it` must be sorted in ascending order.
 
     Examples
     --------
     >>> nums = [0, 1, 2, 3, 17, 18, 42, 50, 51]
     >>> list(group_ranges(nums))
     [range(0, 4), range(17, 19), range(42, 43), range(50, 52)]
     """
     nums = list(it)
+    if not nums:
+        return
     start = prev = nums[0]
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1ae0c7e and aca4508.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • compiler/core/Cargo.toml (1 hunks)
  • compiler/core/src/opcode.rs (1 hunks)
  • compiler/core/src/opcodes.rs (1 hunks)
  • scripts/gen_opcodes.py (1 hunks)
  • stdlib/src/opcode.rs (3 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Format Rust code with the default rustfmt style (run cargo fmt)
Run clippy and fix any warnings or lints introduced by your changes
Follow Rust best practices for error handling and memory management

Files:

  • stdlib/src/opcode.rs
  • compiler/core/src/opcode.rs
  • compiler/core/src/opcodes.rs
{vm,stdlib}/**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use RustPython macros (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • stdlib/src/opcode.rs
🧬 Code graph analysis (3)
stdlib/src/opcode.rs (3)
compiler/core/src/bytecode.rs (2)
  • stack_effect (1391-1516)
  • try_from (814-820)
compiler/core/src/opcode.rs (2)
  • try_from (13-20)
  • try_from (26-29)
compiler/core/src/opcodes.rs (14)
  • has_arg (303-445)
  • has_arg (1026-1038)
  • has_const (449-454)
  • has_const (1042-1044)
  • has_name (527-549)
  • has_name (1079-1087)
  • has_jump (490-506)
  • has_jump (1067-1069)
  • has_free (477-486)
  • has_free (1061-1063)
  • has_local (510-523)
  • has_local (1073-1075)
  • has_exc (458-473)
  • has_exc (1048-1057)
scripts/gen_opcodes.py (1)
compiler/core/src/opcodes.rs (1)
  • deopt (228-299)
compiler/core/src/opcodes.rs (1)
stdlib/src/opcode.rs (8)
  • has_arg (136-138)
  • has_const (141-143)
  • has_exc (166-168)
  • has_free (156-158)
  • has_jump (151-153)
  • has_local (161-163)
  • has_name (146-148)
  • is_valid (131-133)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Check Rust code with rustfmt and clippy
🔇 Additional comments (8)
compiler/core/Cargo.toml (1)

20-21: LGTM! Dependencies align with the new opcode system.

The addition of num_enum and num-traits appropriately supports the generated opcode infrastructure. num_enum provides the TryFromPrimitive derive macro used in opcodes.rs, and num-traits supplies the PrimInt trait bound for the error constructor.

stdlib/src/opcode.rs (3)

7-11: LGTM! Updated imports and type change.

The imports now correctly reference the new Opcode, PseudoOpcode, and RealOpcode types from the vm::opcode module. The change of ENABLE_SPECIALIZATION from i8 to u8 aligns with CPython's unsigned value and removes unnecessary signedness.


23-117: Stack effect implementation aligns well with CPython.

The refactored stack_effect correctly handles RealOpcode and PseudoOpcode separately, mirrors CPython's special-case treatment of ExitInitCheck, rejects specialized opcodes via deopt(), and properly computes stack effects for pseudo-opcodes with jump-dependent logic.


119-168: LGTM! Macro-based real opcode checks.

The real_opcode_check! macro correctly filters out pseudo-opcodes by attempting conversion to RealOpcode via u8::try_from and delegates to the appropriate has_* predicate. This ensures all has_arg, has_const, has_name, has_jump, has_free, has_local, and has_exc functions only operate on real opcodes, consistent with CPython semantics.

scripts/gen_opcodes.py (1)

112-343: Well-structured generator design.

The abstraction via InstructionsMeta, RealInstructions, and PseudoInstructions cleanly separates concerns and generates comprehensive Rust opcode scaffolding including predicates, stack effects, deoptimization mappings, and TryFrom conversions. The CPython variant-to-Rust identifier substitution and automatic cargo fmt invocation are nice touches for maintainability.

compiler/core/src/opcodes.rs (1)

1-1109: Generated opcode definitions are comprehensive and correct.

The auto-generated RealOpcode (repr(u8), 223 variants) and PseudoOpcode (repr(u16), 12 variants) enums provide exhaustive CPython 3.13.7 opcode coverage. The implementation includes:

  • deopt() mappings for specialized opcodes
  • Predicate methods (has_arg, has_const, has_exc, has_free, has_jump, has_local, has_name)
  • Stack effect calculations (num_popped, num_pushed) with oparg-dependent logic
  • is_valid() range checks
  • TryFromPrimitive conversions with custom MarshalError constructor

The generated code follows Rust best practices (const methods where possible, exhaustive matching, proper repr annotations).

compiler/core/src/opcode.rs (2)

1-8: LGTM! Clean opcode wrapper design.

The Opcode enum unifies RealOpcode and PseudoOpcode under a single type, enabling unified handling while preserving type safety. The public re-export of RealOpcode and PseudoOpcode maintains flexibility for code that needs direct access to the underlying types.


32-48: LGTM! Macro reduces TryFrom boilerplate.

The impl_try_from! macro generates consistent TryFrom implementations for signed and unsigned integer types by first converting to u16 and delegating to the existing TryFrom<u16> implementation. This reduces code duplication and ensures uniform error handling.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
scripts/gen_opcodes.py (1)

49-50: Avoid assigning lambdas to methods (E731); use named defs.

Define helpers and assign them to StackOffset for clarity and lint compliance. (Similar to prior feedback.)

-StackOffset.pop = lambda self, item: self.popped.append(_var_size(item))
-StackOffset.push = lambda self, item: self.pushed.append(_var_size(item))
+def _stackoffset_pop(self, item):
+    self.popped.append(_var_size(item))
+
+def _stackoffset_push(self, item):
+    self.pushed.append(_var_size(item))
+
+StackOffset.pop = _stackoffset_pop
+StackOffset.push = _stackoffset_push
🧹 Nitpick comments (4)
scripts/gen_opcodes.py (3)

14-21: Make CPython path configurable and fail with a clear message.

Hardcoding CPYTHON_PATH risks opaque failures. Allow an env override and validate the path.

-CPYTHON_PATH = (
-    pathlib.Path(__file__).parents[2] / "cpython"
-)
+CPYTHON_PATH = pathlib.Path(
+    sys.environ.get("RUSTPYTHON_CPYTHON_PATH", "")
+) or (pathlib.Path(__file__).parents[2] / "cpython")
+
+if not CPYTHON_PATH.exists():
+    sys.exit(f"cpython checkout not found at: {CPYTHON_PATH}. "
+             "Set RUSTPYTHON_CPYTHON_PATH to override.")

253-256: Escape and order tokens in the regex replacement.

Unescaped alternation can mis-match; ordering prevents short-name shadowing.

-    replacements = {name: enum_variant_name(name) for name in analysis.opmap}
-    inner_pattern = "|".join(replacements)
+    replacements = {name: enum_variant_name(name) for name in analysis.opmap}
+    # Longest-first to avoid partial matches (e.g., LOAD vs LOAD_CONST)
+    names = sorted(replacements.keys(), key=len, reverse=True)
+    inner_pattern = "|".join(re.escape(n) for n in names)
     pattern = re.compile(rf"\b({inner_pattern})\b")
     out = pattern.sub(lambda m: replacements[m.group(0)], out)

258-259: Surface formatting failures and keep output consistent.

Use check=True to fail fast; if cargo is missing, at least warn.

-    print("Running `cargo fmt`")
-    subprocess.run(["cargo", "fmt"], cwd=ROOT)
+    print("Running `cargo fmt`")
+    try:
+        subprocess.run(["cargo", "fmt"], cwd=ROOT, check=True)
+    except Exception as e:
+        print(f"Warning: `cargo fmt` failed: {e}", file=sys.stderr)
compiler/core/src/opcodes.rs (1)

227-299: Optional: simplify deopt for readability.

Current Some(match ... _ => return None) works; consider a direct match to avoid early return inside the expression.

-    pub const fn deopt(&self) -> Option<Self> {
-        Some(match *self {
+    pub const fn deopt(&self) -> Option<Self> {
+        match *self {
             /* arms unchanged */
-            _ => return None,
-        })
+            _ => return None,
+        }
     }
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aca4508 and 3daec5a.

📒 Files selected for processing (2)
  • compiler/core/src/opcodes.rs (1 hunks)
  • scripts/gen_opcodes.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Format Rust code with the default rustfmt style (run cargo fmt)
Run clippy and fix any warnings or lints introduced by your changes
Follow Rust best practices for error handling and memory management

Files:

  • compiler/core/src/opcodes.rs
🧬 Code graph analysis (2)
compiler/core/src/opcodes.rs (1)
stdlib/src/opcode.rs (7)
  • has_arg (136-138)
  • has_const (141-143)
  • has_exc (166-168)
  • has_free (156-158)
  • has_jump (151-153)
  • has_local (161-163)
  • has_name (146-148)
scripts/gen_opcodes.py (2)
compiler/core/src/marshal.rs (1)
  • instructions (188-195)
compiler/core/src/opcodes.rs (1)
  • deopt (228-299)
🪛 Flake8 (7.3.0)
scripts/gen_opcodes.py

[error] 12-12: 'collections.abc.Iterable' imported but unused

(F401)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Check Rust code with rustfmt and clippy
🔇 Additional comments (2)
compiler/core/src/opcodes.rs (2)

9-12: Good use of num_enum for fallible conversions.

Derive + error_type hook is clean and const-friendly. LGTM.


988-1005: Approve PseudoOpcode derivation; num_enum dependency confirmed
repr(u16) + TryFromPrimitive matches design and num_enum is declared in Cargo.toml.

import typing

if typing.TYPE_CHECKING:
from collections.abc import Iterable, Iterator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix unused import (flake8 F401).

Iterable isn’t used. Keep only Iterator.

-if typing.TYPE_CHECKING:
-    from collections.abc import Iterable, Iterator
+if typing.TYPE_CHECKING:
+    from collections.abc import Iterator
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from collections.abc import Iterable, Iterator
if typing.TYPE_CHECKING:
from collections.abc import Iterator
🧰 Tools
🪛 Flake8 (7.3.0)

[error] 12-12: 'collections.abc.Iterable' imported but unused

(F401)

🤖 Prompt for AI Agents
In scripts/gen_opcodes.py around line 12, the import statement brings in both
Iterable and Iterator but Iterable is unused; remove Iterable from the import
and keep only Iterator so the line imports Iterator from collections.abc to
satisfy flake8 F401.

Comment on lines +248 to +250
const fn new_invalid_bytecode<T: num_traits::int::PrimInt>(_: T) -> MarshalError {{
MarshalError::InvalidBytecode
}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Confirm num_traits is declared for generated code.

new_invalid_bytecode uses num_traits::int::PrimInt; ensure compiler/core/Cargo.toml includes num_traits.

You can verify with:


🏁 Script executed:

#!/bin/bash
rg -nP '^\s*num_traits\s*=' -g 'compiler/**/Cargo.toml'

Length of output: 0


🏁 Script executed:

#!/bin/bash
# List all Cargo.toml files under compiler/
find compiler -type f -name Cargo.toml

# Search for any num_traits references in those toml files
rg -n 'num_traits' -g 'compiler/**/*.toml'

Length of output: 127


Add num_traits dependency to compiler/core/Cargo.toml: Generated code invokes num_traits::int::PrimInt but no num_traits entry exists in any compiler/*.toml—add it under [dependencies].

🤖 Prompt for AI Agents
In scripts/gen_opcodes.py around lines 248 to 250, the generated Rust uses
num_traits::int::PrimInt but compiler/core/Cargo.toml has no num-traits
dependency; add num-traits to the [dependencies] section (e.g. num-traits =
"0.2") in compiler/core/Cargo.toml, run cargo update/build to ensure the crate
resolves, and regenerate/verify the generated code compiles.

@youknowone
Copy link
Member

Somehow, our Instruction is a u8 backed type because Arg is a zero-sized type. This is a trick to type the argument but not actually owning a value.
So currently an Instruction is not RealOpcode + Option.
If a Python opcode is a byte value of the instruction, one must be able to be backed by the other one.

@ShaharNaveh
Copy link
Collaborator Author

ShaharNaveh commented Oct 14, 2025

Somehow, our Instruction is a u8 backed type because Arg is a zero-sized type. This is a trick to type the argument but not actually owning a value. So currently an Instruction is not RealOpcode + Option. If a Python opcode is a byte value of the instruction, one must be able to be backed by the other one.

So, if I understand it correctly we must keep Instruction as an enum and make the variants to be equal to their opcode. So there's no point in having an Opcode enum, correct?

@youknowone
Copy link
Member

I am currently not perfectly understanding how they are different in Python.
I prefer that way if it is possible. Otherwise when a conversion is required, aligning them with rust language features will be better than using a script

@ShaharNaveh
Copy link
Collaborator Author

I am currently not perfectly understanding how they are different in Python. I prefer that way if it is possible.

Some opcodes that we have does not exist in Cpython & Some were changed. I thought it would be better if we would use CPython Lib/dis.py and test it with Lib/test_dis.py, as CPython tests are more in-depth than what we currently have. imo we can mimic CPython behavior of opcodes tiers and adaptive bytecode execution, it will probably improve our performance.

Otherwise when a conversion is required aligning them with rust language features will be better than using a script

Completely agree. Using Rust safety features is better than forcing rust to behave like C.


For now, I'll convert this PR to draft and try to see if I'm able to auto-generate the Instruction enum in a proper way

@ShaharNaveh ShaharNaveh marked this pull request as draft October 15, 2025 07:51
@youknowone
Copy link
Member

I agree it will be great if dis.py perfectly work.

imo we can mimic CPython behavior of opcodes tiers and adaptive bytecode execution, it will probably improve our performance.

I agree here too.

But it may not be always possible especially when CPython is updated. Because CPython doesn't guarantee bytecode compatibility, every python version can have different bytecode.

We can patch bytecode to be aligned to CPython. But if we have incompatibility,
We'd better to patch opcode.py, dis.py and so on to work with our actual bytecode.

@ShaharNaveh
Copy link
Collaborator Author

But it may not be always possible especially when CPython is updated. Because CPython doesn't guarantee bytecode compatibility, every python version can have different bytecode.

We can patch bytecode to be aligned to CPython. But if we have incompatibility, We'd better to patch opcode.py, dis.py and so on to work with our actual bytecode.

Sounds good. I'll invest more time to see how we can auto generate the Instruction enum. and if it'll be too hard I'll try to make dis.py to work in another way 🥲

@youknowone
Copy link
Member

Now Python has exception table for code object, which is related to the newer except instructions.
And the new instructions are related to the new psuedo instructions

@ShaharNaveh
Copy link
Collaborator Author

Now Python has exception table for code object, which is related to the newer except instructions. And the new instructions are related to the new psuedo instructions

Are we aiming now for Python 3.14? what are the steps to achieve it? Updating the Instructions can't be only thing needed, right?

@youknowone
Copy link
Member

If you are interested in 3.14 support, #5484 is the guideline
The key library is importlib and site. Then we have to fix every other broken libraries.
Due to recent opcode work, probably opcode will be another key library too.

@youknowone
Copy link
Member

Oh, to prevent miscommunication, the exception table is not related to 3.14. It is probably a 3.11 feature which we didn't implement yet.

@ShaharNaveh
Copy link
Collaborator Author

Oh, to prevent miscommunication, the exception table is not related to 3.14. It is probably a 3.11 feature which we didn't implement yet.

No worries. I think that this PR will stay open for a while as the changes required to use the updated instructions is very large. meanwhile I'll open small PRs to get us there slowly but surely

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants