-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Py new #6402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Py new #6402
Conversation
WalkthroughThis PR refactors constructor implementations for built-in Python types (bytes, frozenset, str, tuple) in RustPython, changing how constructor arguments are represented and processed. It moves construction logic from shared helpers into type-specific optimized paths, removes the Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Specific areas requiring extra attention:
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
crates/vm/src/builtins/bytes.rs(1 hunks)crates/vm/src/builtins/set.rs(1 hunks)crates/vm/src/builtins/str.rs(1 hunks)crates/vm/src/builtins/tuple.rs(1 hunks)crates/vm/src/bytes_inner.rs(1 hunks)crates/vm/src/protocol/object.rs(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.rs: Follow the default rustfmt code style by runningcargo fmtto format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass,pymodule,pyfunction, etc.) when implementing Python functionality in Rust
Files:
crates/vm/src/builtins/set.rscrates/vm/src/builtins/tuple.rscrates/vm/src/builtins/bytes.rscrates/vm/src/builtins/str.rscrates/vm/src/bytes_inner.rscrates/vm/src/protocol/object.rs
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-29T12:17:28.606Z
Learning: Applies to Lib/**/*.py : Add a `# TODO: RUSTPYTHON` comment when modifications are made to files in the `Lib/` directory
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In the RustPython codebase, only certain builtin types should be marked with the SEQUENCE flag for pattern matching. List and tuple are sequences, but bytes, bytearray, and range are not considered sequences in this context, even though they may implement sequence-like protocols.
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In RustPython's pattern matching implementation, only certain builtin types should have the SEQUENCE flag: list and tuple are confirmed sequences. The user youknowone indicated that bytes, bytearray are not considered sequences in this context, even though they implement sequence-like protocols.
📚 Learning: 2025-11-29T12:17:28.606Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-29T12:17:28.606Z
Learning: Applies to **/*.rs : Use the macro system (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust
Applied to files:
crates/vm/src/protocol/object.rs
🧬 Code graph analysis (4)
crates/vm/src/builtins/set.rs (5)
crates/vm/src/builtins/bytes.rs (2)
slot_new(97-149)py_new(151-153)crates/vm/src/builtins/tuple.rs (3)
slot_new(115-146)input(122-122)py_new(148-152)crates/vm/src/builtins/complex.rs (1)
slot_new(154-167)crates/vm/src/builtins/float.rs (1)
slot_new(134-147)crates/vm/src/builtins/list.rs (1)
from_iter(47-49)
crates/vm/src/builtins/bytes.rs (3)
crates/vm/src/builtins/str.rs (5)
slot_new(353-366)py_new(368-386)elements(2118-2120)elements(2229-2231)elements(2351-2353)crates/vm/src/builtins/tuple.rs (2)
slot_new(115-146)py_new(148-152)crates/vm/src/builtins/float.rs (1)
slot_new(134-147)
crates/vm/src/builtins/str.rs (1)
crates/vm/src/stdlib/typing.rs (1)
args(157-159)
crates/vm/src/protocol/object.rs (2)
crates/vm/src/builtins/bytes.rs (4)
from(44-48)from(52-54)obj(114-114)slot_new(97-149)crates/vm/src/builtins/object.rs (1)
slot_new(35-111)
🔇 Additional comments (9)
crates/vm/src/bytes_inner.rs (1)
3-7: LGTM - Import cleanup aligns with removed functionality.The import changes (removing
identifier,PyTypeRef, and simplifyingpystrimport) are consistent with the removal of theget_bytes()method that previously depended on these. The remaining imports support theget_bytearray_inner()method which is still used by the bytes constructor.crates/vm/src/builtins/str.rs (2)
353-366: LGTM - Optimization for exact str type is correct.The early return optimization correctly:
- Checks for exact
strtype (not subclasses)- Ensures single positional argument with no kwargs
- Verifies the argument is already an exact
strThis matches CPython's behavior and avoids unnecessary string copying.
368-386: LGTM - Clean separation of payload creation logic.The
py_newhelper correctly handles all construction cases:
- Decoding with encoding parameter
- Converting via
str()without encoding- Empty string for missing input
The separation from
slot_newimproves code clarity and follows the pattern established in other builtins.crates/vm/src/builtins/set.rs (1)
921-958: LGTM - Constructor refactor follows the established pattern.The implementation correctly:
- Returns exact
frozensetinput as-is (line 929-933)- Returns empty frozenset singleton for missing iterable (line 936-938)
- Returns empty frozenset singleton for empty result (line 947-950)
- Delegates payload creation to
py_newwhich usesfrom_iterThe
type Args = Vec<PyObjectRef>declaration aligns withpy_new's signature, whileslot_newhandles the conversion fromFuncArgsto the expected elements vector.crates/vm/src/protocol/object.rs (2)
7-16: LGTM - Import updates align with new construction pathway.The import changes correctly add:
FuncArgsfor constructing function argumentsConstructortrait for invoking the slot_new pathwayThese support the refactored
bytes()method below.
34-43: LGTM - Cleaner construction pathway via Constructor trait.The refactored
bytes()method correctly:
- Maintains the
PyInterror handling (bytes shouldn't accept int directly via this path)- Creates
FuncArgsfrom the single object argument- Delegates to
PyBytes::slot_newwhich handles all construction logic including__bytes__method invocation and optimizationsThis centralizes the bytes construction logic and eliminates the now-removed
get_bytes()pathway.crates/vm/src/builtins/tuple.rs (1)
112-152: LGTM - Constructor follows the unified pattern across builtins.The implementation correctly handles all optimization cases:
- Returns exact input tuple as-is (lines 121-125)
- Returns empty tuple singleton for missing iterable (lines 128-130)
- Returns empty tuple singleton for empty result (lines 139-142)
- Properly separates slot binding (
slot_new) from payload creation (py_new)The
type Args = Vec<PyObjectRef>withpy_newacceptingelements: Self::Argsprovides a clean interface whileslot_newhandles theFuncArgstoVecconversion.crates/vm/src/builtins/bytes.rs (2)
94-118: Constructor optimizations look correct and consistent with other builtin types.The optimization pattern correctly mirrors other builtin constructors (tuple, str, float):
- Empty singleton returned for exact type with no arguments
- Exact input bytes returned as-is when no encoding/errors specified
- Use of
downcast_exactproperly ensures only exactPyBytesinstances bypass construction
139-153: Fallback path and simplifiedpy_newlook good.The structure cleanly separates concerns:
slot_newhandles all optimization logic and protocol dispatchpy_newis reduced to simple construction fromVec<u8>- Empty singleton optimization in the fallback path matches the pattern in
tuple.rs
| // Handle __bytes__ method - may return PyBytes directly | ||
| if let OptionalArg::Present(ref obj) = options.source | ||
| && options.encoding.is_missing() | ||
| && options.errors.is_missing() | ||
| && let Some(bytes_method) = vm.get_method(obj.clone(), identifier!(vm, __bytes__)) | ||
| { | ||
| let bytes = bytes_method?.call((), vm)?; | ||
| // If exact bytes type and __bytes__ returns bytes, use it directly | ||
| if cls.is(vm.ctx.types.bytes_type) | ||
| && let Ok(b) = bytes.clone().downcast::<PyBytes>() | ||
| { | ||
| return Ok(b.into()); | ||
| } | ||
| // Otherwise convert to Vec<u8> | ||
| let inner = PyBytesInner::try_from_borrowed_object(vm, &bytes)?; | ||
| let payload = Self::py_new(&cls, inner.elements, vm)?; | ||
| return payload.into_ref_with_type(vm, cls).map(Into::into); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Find the PyBytesInner::try_from_borrowed_object implementation
rg -A 20 "fn try_from_borrowed_object" crates/vm/src/builtins/bytes.rsRepository: RustPython/RustPython
Length of output: 238
🏁 Script executed:
# Search for __bytes__ related tests or comments
rg -B 3 -A 3 "__bytes__" crates/vm/src/builtins/bytes.rsRepository: RustPython/RustPython
Length of output: 856
🏁 Script executed:
# Check if there are any tests for bytes constructor with __bytes__
fd -e rs -x rg -l "__bytes__" {} \; | head -20Repository: RustPython/RustPython
Length of output: 222
🏁 Script executed:
# Look at the downcast usage pattern in this file
rg "downcast" crates/vm/src/builtins/bytes.rsRepository: RustPython/RustPython
Length of output: 794
🏁 Script executed:
# Find PyBytesInner definition and try_from_borrowed_object impl
rg -B 5 -A 20 "impl.*PyBytesInner" crates/vm/src/builtins/bytes.rs | head -80Repository: RustPython/RustPython
Length of output: 527
🏁 Script executed:
# Search for PyBytesInner enum/struct definition
rg "^(pub )?struct PyBytesInner|^(pub )?enum PyBytesInner" crates/vm/src/builtins/bytes.rsRepository: RustPython/RustPython
Length of output: 47
🏁 Script executed:
# Find where PyBytesInner::try_from_borrowed_object is actually implemented
ast-grep --pattern 'impl PyBytesInner {
$$$
fn try_from_borrowed_object($$$) {
$$$
}
$$$
}'Repository: RustPython/RustPython
Length of output: 47
🏁 Script executed:
# Look for the actual implementation - search more broadly
rg "fn try_from_borrowed_object" crates/vm/src/builtins/bytes.rs -B 2 -A 30 | head -100Repository: RustPython/RustPython
Length of output: 288
🏁 Script executed:
# Look at the beginning/middle of bytes.rs for PyBytesInner impl
head -300 crates/vm/src/builtins/bytes.rs | tail -200Repository: RustPython/RustPython
Length of output: 5642
🏁 Script executed:
# Search for PyBytesInner impl with try_from_borrowed_object more carefully
rg -B 3 "PyBytesInner::try_from_borrowed_object" crates/vm/src/builtins/bytes.rsRepository: RustPython/RustPython
Length of output: 812
🏁 Script executed:
# Look at the entire impl block for PyBytesInner
sed -n '/^impl PyBytesInner/,/^impl /p' crates/vm/src/builtins/bytes.rs | head -100Repository: RustPython/RustPython
Length of output: 47
🏁 Script executed:
# Search across entire vm/src for PyBytesInner::try_from_borrowed_object implementation
rg -B 2 -A 20 "impl.*PyBytesInner.*\{" crates/vm/src/builtins/bytes.rs | grep -A 25 "try_from_borrowed_object"Repository: RustPython/RustPython
Length of output: 47
🏁 Script executed:
# PyBytesInner might be defined as a type alias - search for its definition
rg "type PyBytesInner|struct PyBytesInner|enum PyBytesInner" crates/vm/src/Repository: RustPython/RustPython
Length of output: 119
🏁 Script executed:
# Search in function/buffer.rs since learnings mention it
rg -B 2 -A 25 "try_from_borrowed_object" crates/vm/src/function/buffer.rs | head -80Repository: RustPython/RustPython
Length of output: 2557
🏁 Script executed:
# Look at PyBytesInner implementation in bytes_inner.rs
rg -B 2 -A 25 "fn try_from_borrowed_object" crates/vm/src/bytes_inner.rsRepository: RustPython/RustPython
Length of output: 881
🏁 Script executed:
# Get context around PyBytesInner impl
rg -B 5 -A 30 "impl PyBytesInner" crates/vm/src/bytes_inner.rs | head -100Repository: RustPython/RustPython
Length of output: 1354
🏁 Script executed:
# Find bytes_from_object function
rg -B 3 -A 40 "fn bytes_from_object" crates/vm/src/bytes_inner.rsRepository: RustPython/RustPython
Length of output: 47
🏁 Script executed:
# Search entire codebase for bytes_from_object
rg -B 3 -A 40 "fn bytes_from_object"Repository: RustPython/RustPython
Length of output: 1345
🏁 Script executed:
# Check if it's imported in bytes_inner.rs
rg "bytes_from_object" crates/vm/src/bytes_inner.rsRepository: RustPython/RustPython
Length of output: 201
🌐 Web query:
CPython bytes constructor __bytes__ return type error message TypeError
💡 Result:
When you call bytes(x), CPython will call x.bytes() (if present) and that method must return a bytes object. If it returns some other type, CPython raises a TypeError such as:
- "TypeError: bytes returned non-bytes (type X)" (e.g. X == int or str). [1][2]
Fix: make bytes return a bytes object (e.g. return b"..." or return some_str.encode('utf-8')). [3]
Sources:
- object.bytes doc / pydoc requirement that bytes should return bytes. [1]
- Python tracker / example error message showing "bytes returned non-bytes (type ...)". [2]
- Explanation and example usage showing the TypeError and correct return type. [3]
🏁 Script executed:
# Search for tests of __bytes__ returning non-bytes
rg -B 5 -A 10 "__bytes__.*non" crates/Repository: RustPython/RustPython
Length of output: 47
🏁 Script executed:
# Look for __bytes__ error tests
fd -e py -x rg -l "__bytes__" {} \; | xargs rg -l "TypeError" | head -5Repository: RustPython/RustPython
Length of output: 731
🏁 Script executed:
# Check the test file for __bytes__ tests
cat extra_tests/snippets/builtin_bytes.pyRepository: RustPython/RustPython
Length of output: 20952
🏁 Script executed:
# Look at lines 120-137 more carefully with context
sed -n '115,145p' crates/vm/src/builtins/bytes.rsRepository: RustPython/RustPython
Length of output: 1364
🏁 Script executed:
# Check if there's specific __bytes__ error validation
rg -B 5 -A 5 "returned non-bytes" crates/Repository: RustPython/RustPython
Length of output: 47
The downcast usage on line 129 is correct; consider adding a specific error for __bytes__ returning non-bytes.
Line 129 correctly uses downcast::<PyBytes>() (which accepts subclasses) rather than downcast_exact. This matches CPython semantics: bytes() returns the __bytes__ result directly if it's any bytes instance, including subclasses.
However, when __bytes__ returns a non-bytes type, the current code delegates to PyBytesInner::try_from_borrowed_object, which raises a generic error: "can assign only bytes, buffers, or iterables of ints in range(0, 256)". CPython raises a more specific error: "TypeError: bytes returned non-bytes (type X)". Consider adding explicit validation to match CPython's error message.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.