Skip to content

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Dec 11, 2025

Summary by CodeRabbit

  • Performance
    • Enhanced constructor implementation for built-in bytes, frozenset, string, and tuple types to improve object creation performance. Optimized code paths now handle exact type instances with direct returns, efficient singleton handling for empty collections, and streamlined construction flows that reduce overhead in common usage scenarios.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 11, 2025

Walkthrough

This PR refactors constructor implementations for built-in Python types (bytes, frozenset, str, tuple) in RustPython, changing how constructor arguments are represented and processed. It moves construction logic from shared helpers into type-specific optimized paths, removes the ByteInnerNewOptions.get_bytes() method, and updates the bytes protocol to use a new FuncArgs-based approach.

Changes

Cohort / File(s) Summary
Constructor argument type refactoring
crates/vm/src/builtins/bytes.rs, crates/vm/src/builtins/set.rs, crates/vm/src/builtins/tuple.rs
Changed type Args from wrapper types (ByteInnerNewOptions, OptionalArg<PyObjectRef>) to direct container types (Vec<u8>, Vec<PyObjectRef>). Added early-return optimizations in slot_new for exact types (empty singletons, direct returns for existing objects). Updated py_new signatures to accept the new Args types and construct payloads directly.
String constructor optimization
crates/vm/src/builtins/str.rs
Introduced early-return optimization for exact str type with single str argument. Refactored slot_new to delegate payload creation to new internal py_new helper, separating constructor logic from type-wrapping logic.
Shared method removal
crates/vm/src/bytes_inner.rs
Removed public get_bytes() method from ByteInnerNewOptions impl; construction logic now dispersed into type-specific paths. Incidental import cleanup.
Protocol bytes construction update
crates/vm/src/protocol/object.rs
Replaced ByteInnerNewOptions-based bytes construction with new FuncArgs-based Constructor::slot_new path. Updated imports to use FuncArgs and Constructor.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Multiple interconnected files with public API changes: Constructor type Args changes in bytes, frozenset, and tuple affect both signatures and internal logic across multiple files.
  • Optimization paths require validation: Early-return logic for empty singletons, exact type checks, and __bytes__/object-identity short-circuits must be verified for correctness.
  • Removal of shared helper: The deleted get_bytes() method shifts responsibility to individual constructors; ensure no other code paths depend on it.
  • Pattern consistency: Similar changes applied across multiple types (repetition reduces complexity somewhat), but each type has unique logic requiring individual review.

Specific areas requiring extra attention:

  • Validation of singleton and exact-type short-circuit logic in slot_new methods (bytes, frozenset, tuple)
  • Correctness of py_new implementations accepting Vec<u8> and Vec<PyObjectRef> and properly constructing payloads
  • Verification that the new FuncArgs-based bytes() flow in protocol/object.rs maintains parity with the old ByteInnerNewOptions path
  • Confirmation that removal of get_bytes() does not break any call sites outside this diff

Possibly related PRs

Suggested reviewers

  • ShaharNaveh
  • coolreader18

Poem

🐰 Hoppy constructor paths now shine so bright,
With Vec and singletons taking flight,
Exact types zoom past, no detours in sight,
While shared get_bytes fades into the night.
RustPython's types dance lean and light!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'Py new' is too vague and generic; it does not convey meaningful information about the substantive changes across multiple builtins (bytes, frozenset, str, tuple) and their constructor signatures. Use a more descriptive title that captures the main objective, such as 'Refactor constructor argument types for builtin types' or 'Introduce py_new optimization paths for bytes, frozenset, str, and tuple'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@youknowone youknowone marked this pull request as ready for review December 11, 2025 14:09
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5f496c9 and c5abc44.

📒 Files selected for processing (6)
  • crates/vm/src/builtins/bytes.rs (1 hunks)
  • crates/vm/src/builtins/set.rs (1 hunks)
  • crates/vm/src/builtins/str.rs (1 hunks)
  • crates/vm/src/builtins/tuple.rs (1 hunks)
  • crates/vm/src/bytes_inner.rs (1 hunks)
  • crates/vm/src/protocol/object.rs (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/vm/src/builtins/set.rs
  • crates/vm/src/builtins/tuple.rs
  • crates/vm/src/builtins/bytes.rs
  • crates/vm/src/builtins/str.rs
  • crates/vm/src/bytes_inner.rs
  • crates/vm/src/protocol/object.rs
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-29T12:17:28.606Z
Learning: Applies to Lib/**/*.py : Add a `# TODO: RUSTPYTHON` comment when modifications are made to files in the `Lib/` directory
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In the RustPython codebase, only certain builtin types should be marked with the SEQUENCE flag for pattern matching. List and tuple are sequences, but bytes, bytearray, and range are not considered sequences in this context, even though they may implement sequence-like protocols.
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In RustPython's pattern matching implementation, only certain builtin types should have the SEQUENCE flag: list and tuple are confirmed sequences. The user youknowone indicated that bytes, bytearray are not considered sequences in this context, even though they implement sequence-like protocols.
📚 Learning: 2025-11-29T12:17:28.606Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-29T12:17:28.606Z
Learning: Applies to **/*.rs : Use the macro system (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust

Applied to files:

  • crates/vm/src/protocol/object.rs
🧬 Code graph analysis (4)
crates/vm/src/builtins/set.rs (5)
crates/vm/src/builtins/bytes.rs (2)
  • slot_new (97-149)
  • py_new (151-153)
crates/vm/src/builtins/tuple.rs (3)
  • slot_new (115-146)
  • input (122-122)
  • py_new (148-152)
crates/vm/src/builtins/complex.rs (1)
  • slot_new (154-167)
crates/vm/src/builtins/float.rs (1)
  • slot_new (134-147)
crates/vm/src/builtins/list.rs (1)
  • from_iter (47-49)
crates/vm/src/builtins/bytes.rs (3)
crates/vm/src/builtins/str.rs (5)
  • slot_new (353-366)
  • py_new (368-386)
  • elements (2118-2120)
  • elements (2229-2231)
  • elements (2351-2353)
crates/vm/src/builtins/tuple.rs (2)
  • slot_new (115-146)
  • py_new (148-152)
crates/vm/src/builtins/float.rs (1)
  • slot_new (134-147)
crates/vm/src/builtins/str.rs (1)
crates/vm/src/stdlib/typing.rs (1)
  • args (157-159)
crates/vm/src/protocol/object.rs (2)
crates/vm/src/builtins/bytes.rs (4)
  • from (44-48)
  • from (52-54)
  • obj (114-114)
  • slot_new (97-149)
crates/vm/src/builtins/object.rs (1)
  • slot_new (35-111)
🔇 Additional comments (9)
crates/vm/src/bytes_inner.rs (1)

3-7: LGTM - Import cleanup aligns with removed functionality.

The import changes (removing identifier, PyTypeRef, and simplifying pystr import) are consistent with the removal of the get_bytes() method that previously depended on these. The remaining imports support the get_bytearray_inner() method which is still used by the bytes constructor.

crates/vm/src/builtins/str.rs (2)

353-366: LGTM - Optimization for exact str type is correct.

The early return optimization correctly:

  1. Checks for exact str type (not subclasses)
  2. Ensures single positional argument with no kwargs
  3. Verifies the argument is already an exact str

This matches CPython's behavior and avoids unnecessary string copying.


368-386: LGTM - Clean separation of payload creation logic.

The py_new helper correctly handles all construction cases:

  • Decoding with encoding parameter
  • Converting via str() without encoding
  • Empty string for missing input

The separation from slot_new improves code clarity and follows the pattern established in other builtins.

crates/vm/src/builtins/set.rs (1)

921-958: LGTM - Constructor refactor follows the established pattern.

The implementation correctly:

  1. Returns exact frozenset input as-is (line 929-933)
  2. Returns empty frozenset singleton for missing iterable (line 936-938)
  3. Returns empty frozenset singleton for empty result (line 947-950)
  4. Delegates payload creation to py_new which uses from_iter

The type Args = Vec<PyObjectRef> declaration aligns with py_new's signature, while slot_new handles the conversion from FuncArgs to the expected elements vector.

crates/vm/src/protocol/object.rs (2)

7-16: LGTM - Import updates align with new construction pathway.

The import changes correctly add:

  • FuncArgs for constructing function arguments
  • Constructor trait for invoking the slot_new pathway

These support the refactored bytes() method below.


34-43: LGTM - Cleaner construction pathway via Constructor trait.

The refactored bytes() method correctly:

  1. Maintains the PyInt error handling (bytes shouldn't accept int directly via this path)
  2. Creates FuncArgs from the single object argument
  3. Delegates to PyBytes::slot_new which handles all construction logic including __bytes__ method invocation and optimizations

This centralizes the bytes construction logic and eliminates the now-removed get_bytes() pathway.

crates/vm/src/builtins/tuple.rs (1)

112-152: LGTM - Constructor follows the unified pattern across builtins.

The implementation correctly handles all optimization cases:

  1. Returns exact input tuple as-is (lines 121-125)
  2. Returns empty tuple singleton for missing iterable (lines 128-130)
  3. Returns empty tuple singleton for empty result (lines 139-142)
  4. Properly separates slot binding (slot_new) from payload creation (py_new)

The type Args = Vec<PyObjectRef> with py_new accepting elements: Self::Args provides a clean interface while slot_new handles the FuncArgs to Vec conversion.

crates/vm/src/builtins/bytes.rs (2)

94-118: Constructor optimizations look correct and consistent with other builtin types.

The optimization pattern correctly mirrors other builtin constructors (tuple, str, float):

  • Empty singleton returned for exact type with no arguments
  • Exact input bytes returned as-is when no encoding/errors specified
  • Use of downcast_exact properly ensures only exact PyBytes instances bypass construction

139-153: Fallback path and simplified py_new look good.

The structure cleanly separates concerns:

  • slot_new handles all optimization logic and protocol dispatch
  • py_new is reduced to simple construction from Vec<u8>
  • Empty singleton optimization in the fallback path matches the pattern in tuple.rs

Comment on lines +120 to +137
// Handle __bytes__ method - may return PyBytes directly
if let OptionalArg::Present(ref obj) = options.source
&& options.encoding.is_missing()
&& options.errors.is_missing()
&& let Some(bytes_method) = vm.get_method(obj.clone(), identifier!(vm, __bytes__))
{
let bytes = bytes_method?.call((), vm)?;
// If exact bytes type and __bytes__ returns bytes, use it directly
if cls.is(vm.ctx.types.bytes_type)
&& let Ok(b) = bytes.clone().downcast::<PyBytes>()
{
return Ok(b.into());
}
// Otherwise convert to Vec<u8>
let inner = PyBytesInner::try_from_borrowed_object(vm, &bytes)?;
let payload = Self::py_new(&cls, inner.elements, vm)?;
return payload.into_ref_with_type(vm, cls).map(Into::into);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Find the PyBytesInner::try_from_borrowed_object implementation
rg -A 20 "fn try_from_borrowed_object" crates/vm/src/builtins/bytes.rs

Repository: RustPython/RustPython

Length of output: 238


🏁 Script executed:

# Search for __bytes__ related tests or comments
rg -B 3 -A 3 "__bytes__" crates/vm/src/builtins/bytes.rs

Repository: RustPython/RustPython

Length of output: 856


🏁 Script executed:

# Check if there are any tests for bytes constructor with __bytes__
fd -e rs -x rg -l "__bytes__" {} \; | head -20

Repository: RustPython/RustPython

Length of output: 222


🏁 Script executed:

# Look at the downcast usage pattern in this file
rg "downcast" crates/vm/src/builtins/bytes.rs

Repository: RustPython/RustPython

Length of output: 794


🏁 Script executed:

# Find PyBytesInner definition and try_from_borrowed_object impl
rg -B 5 -A 20 "impl.*PyBytesInner" crates/vm/src/builtins/bytes.rs | head -80

Repository: RustPython/RustPython

Length of output: 527


🏁 Script executed:

# Search for PyBytesInner enum/struct definition
rg "^(pub )?struct PyBytesInner|^(pub )?enum PyBytesInner" crates/vm/src/builtins/bytes.rs

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

# Find where PyBytesInner::try_from_borrowed_object is actually implemented
ast-grep --pattern 'impl PyBytesInner {
  $$$
  fn try_from_borrowed_object($$$) {
    $$$
  }
  $$$
}'

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

# Look for the actual implementation - search more broadly
rg "fn try_from_borrowed_object" crates/vm/src/builtins/bytes.rs -B 2 -A 30 | head -100

Repository: RustPython/RustPython

Length of output: 288


🏁 Script executed:

# Look at the beginning/middle of bytes.rs for PyBytesInner impl
head -300 crates/vm/src/builtins/bytes.rs | tail -200

Repository: RustPython/RustPython

Length of output: 5642


🏁 Script executed:

# Search for PyBytesInner impl with try_from_borrowed_object more carefully
rg -B 3 "PyBytesInner::try_from_borrowed_object" crates/vm/src/builtins/bytes.rs

Repository: RustPython/RustPython

Length of output: 812


🏁 Script executed:

# Look at the entire impl block for PyBytesInner
sed -n '/^impl PyBytesInner/,/^impl /p' crates/vm/src/builtins/bytes.rs | head -100

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

# Search across entire vm/src for PyBytesInner::try_from_borrowed_object implementation
rg -B 2 -A 20 "impl.*PyBytesInner.*\{" crates/vm/src/builtins/bytes.rs | grep -A 25 "try_from_borrowed_object"

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

# PyBytesInner might be defined as a type alias - search for its definition
rg "type PyBytesInner|struct PyBytesInner|enum PyBytesInner" crates/vm/src/

Repository: RustPython/RustPython

Length of output: 119


🏁 Script executed:

# Search in function/buffer.rs since learnings mention it
rg -B 2 -A 25 "try_from_borrowed_object" crates/vm/src/function/buffer.rs | head -80

Repository: RustPython/RustPython

Length of output: 2557


🏁 Script executed:

# Look at PyBytesInner implementation in bytes_inner.rs
rg -B 2 -A 25 "fn try_from_borrowed_object" crates/vm/src/bytes_inner.rs

Repository: RustPython/RustPython

Length of output: 881


🏁 Script executed:

# Get context around PyBytesInner impl
rg -B 5 -A 30 "impl PyBytesInner" crates/vm/src/bytes_inner.rs | head -100

Repository: RustPython/RustPython

Length of output: 1354


🏁 Script executed:

# Find bytes_from_object function
rg -B 3 -A 40 "fn bytes_from_object" crates/vm/src/bytes_inner.rs

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

# Search entire codebase for bytes_from_object
rg -B 3 -A 40 "fn bytes_from_object"

Repository: RustPython/RustPython

Length of output: 1345


🏁 Script executed:

# Check if it's imported in bytes_inner.rs
rg "bytes_from_object" crates/vm/src/bytes_inner.rs

Repository: RustPython/RustPython

Length of output: 201


🌐 Web query:

CPython bytes constructor __bytes__ return type error message TypeError

💡 Result:

When you call bytes(x), CPython will call x.bytes() (if present) and that method must return a bytes object. If it returns some other type, CPython raises a TypeError such as:

  • "TypeError: bytes returned non-bytes (type X)" (e.g. X == int or str). [1][2]

Fix: make bytes return a bytes object (e.g. return b"..." or return some_str.encode('utf-8')). [3]

Sources:

  • object.bytes doc / pydoc requirement that bytes should return bytes. [1]
  • Python tracker / example error message showing "bytes returned non-bytes (type ...)". [2]
  • Explanation and example usage showing the TypeError and correct return type. [3]

🏁 Script executed:

# Search for tests of __bytes__ returning non-bytes
rg -B 5 -A 10 "__bytes__.*non" crates/

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

# Look for __bytes__ error tests
fd -e py -x rg -l "__bytes__" {} \; | xargs rg -l "TypeError" | head -5

Repository: RustPython/RustPython

Length of output: 731


🏁 Script executed:

# Check the test file for __bytes__ tests
cat extra_tests/snippets/builtin_bytes.py

Repository: RustPython/RustPython

Length of output: 20952


🏁 Script executed:

# Look at lines 120-137 more carefully with context
sed -n '115,145p' crates/vm/src/builtins/bytes.rs

Repository: RustPython/RustPython

Length of output: 1364


🏁 Script executed:

# Check if there's specific __bytes__ error validation
rg -B 5 -A 5 "returned non-bytes" crates/

Repository: RustPython/RustPython

Length of output: 47


The downcast usage on line 129 is correct; consider adding a specific error for __bytes__ returning non-bytes.

Line 129 correctly uses downcast::<PyBytes>() (which accepts subclasses) rather than downcast_exact. This matches CPython semantics: bytes() returns the __bytes__ result directly if it's any bytes instance, including subclasses.

However, when __bytes__ returns a non-bytes type, the current code delegates to PyBytesInner::try_from_borrowed_object, which raises a generic error: "can assign only bytes, buffers, or iterables of ints in range(0, 256)". CPython raises a more specific error: "TypeError: bytes returned non-bytes (type X)". Consider adding explicit validation to match CPython's error message.

@youknowone youknowone merged commit a477835 into RustPython:main Dec 11, 2025
32 of 35 checks passed
@youknowone youknowone deleted the py-new branch December 11, 2025 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant