-
Notifications
You must be signed in to change notification settings - Fork 1.4k
_ctypes pt. 5 #5653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
_ctypes pt. 5 #5653
Conversation
a7cc982 to
4b13774
Compare
3c2a579 to
b9a4135
Compare
|
@arihant2math any chance this to be finished? |
ebd6c7e to
97b991b
Compare
|
Caution Review failedThe pull request is closed. WalkthroughExtends the _ctypes module with Field and Thunk types, refactors PyCFuncPtr to trait-driven ArgumentType/ReturnType conversion and dynamic CIF construction, replaces PyCPointer storage with a contents field, adds PyCArray::to_arg, and hardens Unix library-loading error handling. Changes
Sequence Diagram(s)sequenceDiagram
participant Py as Python caller
participant VM as Rust VM / PyCFuncPtr
participant Conv as ArgumentType/ReturnType
participant libffi as libffi (CIF)
participant Lib as Native library (dlopen/dlsym)
Py->>VM: call PyCFuncPtr(...)
VM->>Conv: convert each arg -> libffi::Arg
VM->>libffi: build CIF from arg types and ret type
VM->>Lib: ensure library loaded / resolve symbol (dlopen/dlsym)
VM->>libffi: call function via CIF and CodePtr
libffi-->>VM: raw return value
VM->>Conv: convert raw return -> PyObject
VM-->>Py: return PyObject
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yml Review profile: CHILL Plan: Pro 📒 Files selected for processing (6)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Code has been automatically formatted The code in this PR has been formatted using git pull origin ctypes-pt5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
crates/vm/src/stdlib/ctypes/base.rs (1)
43-51: Duplicate condition in type check.Lines 43-50 contain an identical condition duplicated twice - both check
downcast_exact::<PyBytes>(vm).is_ok_and(|v| v.len() == 1). The second check appears to be a copy-paste error.if value .clone() .downcast_exact::<PyBytes>(vm) .is_ok_and(|v| v.len() == 1) - || value - .clone() - .downcast_exact::<PyBytes>(vm) - .is_ok_and(|v| v.len() == 1) || value .clone() .downcast_exact::<PyInt>(vm)crates/vm/src/stdlib/ctypes/array.rs (1)
109-115: Use-after-free and potential panic into_arg.Two issues:
Dangling reference:
libffi::middle::Arg::new(&bytes)captures a reference tobytes, butbytesis a localVec<u8>that is dropped when the function returns. This causes undefined behavior when theArgis used.Panic on type mismatch:
unwrap()on line 112 will panic ifvalueis not aPyBytes. Consider returning an error instead.#[allow(unused)] - pub fn to_arg(&self, _vm: &VirtualMachine) -> PyResult<libffi::middle::Arg> { + pub fn to_arg(&self, vm: &VirtualMachine) -> PyResult<(Vec<u8>, libffi::middle::Arg)> { let value = self.value.read(); - let py_bytes = value.downcast_ref::<PyBytes>().unwrap(); - let bytes = py_bytes.payload().to_vec(); - Ok(libffi::middle::Arg::new(&bytes)) + let py_bytes = value + .downcast_ref::<PyBytes>() + .ok_or_else(|| vm.new_type_error("expected bytes"))?; + let bytes = py_bytes.as_bytes().to_vec(); + // Caller must keep `bytes` alive until FFI call completes + let arg = libffi::middle::Arg::new(&bytes); + Ok((bytes, arg)) }The caller must hold the returned
Vec<u8>until after the FFI call completes to ensure the reference remains valid.crates/vm/src/stdlib/ctypes.rs (1)
192-204: Inconsistent error handling betweenload_library_windowsandload_library_unix.The Unix
dlopenfunction now properly propagates errors viamap_err, butload_library_windowsat line 202 still uses.unwrap()which will panic on library loading failures.Apply this diff to make error handling consistent:
#[pyfunction(name = "LoadLibrary")] fn load_library_windows( name: String, _load_flags: OptionalArg<i32>, vm: &VirtualMachine, ) -> PyResult<usize> { // TODO: audit functions first // TODO: load_flags let cache = library::libcache(); let mut cache_write = cache.write(); - let (id, _) = cache_write.get_or_insert_lib(&name, vm).unwrap(); + let (id, _) = cache_write + .get_or_insert_lib(&name, vm) + .map_err(|e| vm.new_os_error(e.to_string()))?; Ok(id) }Also applies to: 216-218
🧹 Nitpick comments (6)
crates/vm/src/stdlib/ctypes/pointer.rs (1)
28-41: Simplify the infallible getter.The
contentsgetter wraps a simple read operation inPyResultbut cannot fail. Consider returningPyObjectRefdirectly for consistency with similar getters (e.g.,PyCArray::valueat line 97-99 in array.rs).#[pygetset] - fn contents(&self) -> PyResult<PyObjectRef> { - let contents = self.contents.read().clone(); - Ok(contents) + fn contents(&self) -> PyObjectRef { + self.contents.read().clone() }Also note the TODO comment at line 30 - ensure this is tracked for completion.
crates/vm/src/stdlib/ctypes/field.rs (2)
25-35: Clarifybitfield_sizesemantics: naming suggests a size, but type isbool.The field
bitfield_size: bool(line 32) and the accessorbit_size(&self) -> bool(lines 82-84) have confusing semantics. The names suggest they should hold/return a size value (integer), but they're typed asbool.If this is intentional (e.g., indicating whether a bitfield exists), consider renaming to
is_bitfieldorhas_bitfield_size. Otherwise, change the type to an appropriate integer type.Also applies to: 81-84
58-72: EmptyFromArgsstruct and constructor that always fails.The
PyCFieldConstructorArgsstruct is empty with only comments describing expected parameters. The constructor always returns an error, which is fine for preventing direct instantiation. However, consider adding a#[allow(dead_code)]attribute or a comment explaining this is intentional scaffolding.crates/vm/src/stdlib/ctypes/function.rs (3)
199-205: Replace deprecated.flatten()with.and_then().The
.map(...).flatten()pattern on anOptionis deprecated. Use.and_then()for clearer intent.- let ffi_return_type = return_type - .as_ref() - .map(|t| ReturnType::to_ffi_type(&t.clone().downcast::<PyType>().unwrap())) - .flatten() - .unwrap_or_else(|| Type::i32()); + let ffi_return_type = return_type + .as_ref() + .and_then(|t| t.clone().downcast::<PyType>().ok()) + .and_then(|t| ReturnType::to_ffi_type(&t)) + .unwrap_or_else(Type::i32);
219-223: Add safety comment for unsafe FFI call.The
unsafeblock at line 223 performs an FFI call that could have undefined behavior if the CIF or arguments are incorrectly constructed. Add a safety comment explaining the invariants that must hold.let code_ptr = pointer .as_ref() .ok_or_else(|| vm.new_type_error("Function pointer not set".to_string()))?; - let mut output: c_void = unsafe { cif.call(*code_ptr, &ffi_args) }; + // SAFETY: code_ptr is valid (loaded from library), cif matches the arg_types, + // and ffi_args are constructed to match the CIF's expected types. + let mut output: c_void = unsafe { cif.call(*code_ptr, &ffi_args) };
243-255: Clarify Windows-specific COM offset in repr.The hardcoded
0x1000offset subtraction on Windows (line 250) appears to be for COM method representation. Consider adding a comment explaining this magic number.#[cfg(windows)] { + // COM vtable methods are offset by 0x1000 in ctypes convention let index = index - 0x1000; return Ok(format!("<COM method offset {index:#x} {type_name}>")); }
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
crates/vm/src/stdlib/ctypes.rs(3 hunks)crates/vm/src/stdlib/ctypes/array.rs(1 hunks)crates/vm/src/stdlib/ctypes/base.rs(2 hunks)crates/vm/src/stdlib/ctypes/field.rs(1 hunks)crates/vm/src/stdlib/ctypes/function.rs(3 hunks)crates/vm/src/stdlib/ctypes/pointer.rs(1 hunks)crates/vm/src/stdlib/ctypes/thunk.rs(1 hunks)extra_tests/snippets/stdlib_ctypes.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.rs: Follow the default rustfmt code style in Rust code (cargo fmtto format)
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by your changes
Follow Rust best practices for error handling and memory management
Files:
crates/vm/src/stdlib/ctypes/thunk.rscrates/vm/src/stdlib/ctypes/field.rscrates/vm/src/stdlib/ctypes/pointer.rscrates/vm/src/stdlib/ctypes/base.rscrates/vm/src/stdlib/ctypes/function.rscrates/vm/src/stdlib/ctypes/array.rscrates/vm/src/stdlib/ctypes.rs
**/src/**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use the macro system (
pyclass,pymodule,pyfunction, etc.) when implementing Python functionality in Rust
Files:
crates/vm/src/stdlib/ctypes/thunk.rscrates/vm/src/stdlib/ctypes/field.rscrates/vm/src/stdlib/ctypes/pointer.rscrates/vm/src/stdlib/ctypes/base.rscrates/vm/src/stdlib/ctypes/function.rscrates/vm/src/stdlib/ctypes/array.rscrates/vm/src/stdlib/ctypes.rs
**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.py: Follow PEP 8 style for custom Python code
Use ruff for linting Python code
Files:
extra_tests/snippets/stdlib_ctypes.py
🧠 Learnings (2)
📚 Learning: 2025-11-25T11:05:02.899Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T11:05:02.899Z
Learning: Applies to Lib/**/*.py : Minimize modifications to CPython standard library files in the Lib/ directory
Applied to files:
extra_tests/snippets/stdlib_ctypes.py
📚 Learning: 2025-11-25T11:05:02.899Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T11:05:02.899Z
Learning: Applies to **/src/**/*.rs : Use the macro system (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust
Applied to files:
crates/vm/src/stdlib/ctypes/pointer.rscrates/vm/src/stdlib/ctypes.rs
🧬 Code graph analysis (1)
crates/vm/src/stdlib/ctypes/base.rs (2)
crates/vm/src/stdlib/ctypes/function.rs (1)
value(51-51)crates/vm/src/stdlib/ctypes/array.rs (2)
value(98-100)value(112-112)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Check the WASM package and demo
- GitHub Check: Run tests under miri
- GitHub Check: Run snippets and cpython tests (ubuntu-latest)
- GitHub Check: Run snippets and cpython tests (windows-latest)
- GitHub Check: Run snippets and cpython tests (macos-latest)
- GitHub Check: Run rust tests (ubuntu-latest)
- GitHub Check: Ensure compilation on various targets
- GitHub Check: Run rust tests (macos-latest)
- GitHub Check: Run rust tests (windows-latest)
🔇 Additional comments (11)
crates/vm/src/stdlib/ctypes/base.rs (1)
221-227: LGTM!The constructor now correctly uses
into_ref_with_typeand returns the typed reference, which aligns with the broader ctypes restructuring pattern seen in this PR.crates/vm/src/stdlib/ctypes/pointer.rs (1)
7-15: LGTM - metaclass structure follows ctypes pattern.The
PyCPointerTypewrapper with an innerPyCPointerfollows the established metaclass pattern used elsewhere in the ctypes module (similar toPyCArrayType). The#[allow(dead_code)]is appropriate for scaffolding code.crates/vm/src/stdlib/ctypes/thunk.rs (1)
17-22: Placeholder struct acknowledged.The empty
PyCThunkstruct is scaffolding for future implementation. The commented C typedef provides useful context for what fields will be needed. This follows thepyclassmacro pattern per coding guidelines.Consider adding a TODO comment or tracking issue for implementing the remaining fields from the C struct.
crates/vm/src/stdlib/ctypes.rs (2)
5-10: LGTM!New module declarations for
fieldandthunkare properly placed alongside existing submodules.
22-23: LGTM!New types
PyCFieldType,PyCPointerType,CField, andCThunkObjectare properly initialized and wired into the module extension, following the established pattern.Also applies to: 28-28, 33-33
crates/vm/src/stdlib/ctypes/field.rs (2)
1-16: LGTM on type definitions.The
PyCFieldTypemetaclass and imports follow the established patterns in the codebase. As per coding guidelines, thepyclassmacro system is properly used for implementing Python functionality.
74-125: LGTM on pygetset accessors.The accessors provide appropriate read-only access to field metadata. The pattern follows other ctypes implementations in the codebase.
crates/vm/src/stdlib/ctypes/function.rs (4)
23-60: LGTM on ArgumentType trait design.The trait-based approach for FFI type conversion is well-structured and extensible. The error handling properly uses
PyResultthroughout.
121-176: LGTM on constructor logic.The constructor properly resolves library handles, loads symbols, and initializes the struct. Error handling uses
map_errappropriately.
76-83: I'll verify the review comment about the incompletefrom_ffi_typeimplementation. Let me start by exploring the context of this code.
<function_calls>
cat -n crates/vm/src/stdlib/ctypes/function.rs | head -100
</function_calls>Let me get the file size first to understand its scope:
<function_calls>
wc -l crates/vm/src/stdlib/ctypes/function.rs
</function_calls>Now let me read the relevant section:
<function_calls>
cat -n crates/vm/src/stdlib/ctypes/function.rs | sed -n '60,100p'
</function_calls>Let me search for the struct/trait context and callers of this method:
<function_calls>
rg -n "from_ffi_type" crates/vm/src/stdlib/ctypes/ -B 3 -A 3
</function_calls>Let me search for where
from_ffi_typeis actually invoked:
<function_calls>
rg -n ".from_ffi_type(" crates/vm/src/stdlib/ctypes/ -B 2 -A 2
</function_calls>Let me look for the trait definition and understand the context better:
<function_calls>
cat -n crates/vm/src/stdlib/ctypes/function.rs | sed -n '1,85p'
</function_calls>Let me search more broadly for usage patterns:
<function_calls>
rg -n "from_ffi_type" crates/vm/src/ --type=rs -B 2 -A 2
</function_calls>Let me check what the trait definition looks like and if there are other implementations:
<function_calls>
rg -n "trait.*FFIType|impl.*FFIType" crates/vm/src/ --type=rs -B 1 -A 5
</function_calls>Let me search for the FFIType trait definition:
<function_calls>
rg -n "trait.*Type" crates/vm/src/stdlib/ctypes/ --type=rs -B 1 -A 10 | head -80
</function_calls>Let me get a broader view of the function.rs file:
<function_calls>
head -150 crates/vm/src/stdlib/ctypes/function.rs
</function_calls>Now let me search for where the CDataType trait methods are called:
<function_calls>
rg -n "from_ffi_type" crates/ --type=rs | head -20
</function_calls><function_calls>
rg -n ".from_ffi_type" crates/ --type=rs
</function_calls>Let me check if there are any tests for this functionality:
<function_calls>
find crates/ -name "test" -type f | grep -i ctypes
</function_calls><function_calls>
rg -n "todo!" crates/vm/src/stdlib/ctypes/ --type=rs
</function_calls>Let me look at the surrounding context more carefully and check what trait this implements:
<function_calls>
sed -n '50,90p' crates/vm/src/stdlib/ctypes/function.rs
</function_calls>Let me search for the CDataType trait definition:
<function_calls>
rg -n "trait CDataType" crates/vm/src/stdlib/ctypes/ -A 20
</function_calls>Let me look at the entire ctypes module structure:
<function_calls>
ls -la crates/vm/src/stdlib/ctypes/
</function_calls><function_calls>
rg -n "impl.*CDataType" crates/vm/src/stdlib/ctypes/ --type=rs
</function_calls>Let me check the mod.rs file:
<function_calls>
cat crates/vm/src/stdlib/ctypes/mod.rs | head -100
</function_calls>Let me look at the types.rs to understand CDataType better:
<function_calls>
rg -n "trait CDataType|from_ffi_type" crates/vm/src/stdlib/ctypes/types.rs -B 2 -A 8 | head -80
</function_calls>Now let me look at where from_ffi_type is actually called in the context of function calls:
<function_calls>
rg -n "from_ffi_type|to_ffi_value" crates/vm/src/stdlib/ctypes/ --type=rs -B 1 -A 1 | head -100
</function_calls>Let me check the ffi.rs file to see if there's logic that calls from_ffi_type:
<function_calls>
ls -la crates/vm/src/stdlib/ctypes/ | grep -E ".rs$"
</function_calls><function_calls>
wc -l crates/vm/src/stdlib/ctypes/*.rs
</function_calls>Let me check the module for function calls or invocations of from_ffi_type:
<function_calls>
rg -n "call.*function|invoke|CFuncPointer" crates/vm/src/stdlib/ctypes/mod.rs --type=rs -B 2 -A 5 | head -100
</function_calls>Let me search for how CFunction is used:
<function_calls>
grep -n "CFunction|CFunc" crates/vm/src/stdlib/ctypes/function.rs | head -20
</function_calls>Let me get the full context of the function.rs file around the CFunction struct:
<function_calls>
head -200 crates/vm/src/stdlib/ctypes/function.rs
</function_calls>Let me search for where CFunction methods are called or where this gets instantiated:
<function_calls>
rg -n "CFunction|CFuncPointer" crates/vm/src/stdlib/ctypes/ --type=rs -B 1 -A 1 | grep -v "test" | head -60
</function_calls>Let me check if there are any integration tests that use ctypes functions:
<function_calls>
find crates -name ".rs" -path "/test*" -o -name "test.rs" | xargs grep -l "ctypes" 2>/dev/null | head -5
</function_calls><function_calls>
rg -n "CFuncPointer" crates/ --type=rs | head -10
</function_calls>Let me check if this is behind a feature gate or condition:
<function_calls>
sed -n '1,200p' crates/vm/src/stdlib/ctypes/function.rs | grep -E "#[|#cfg|#!|pub fn|pub struct|impl"
</function_calls>Let me read the full function.rs file to understand context:
<function_calls>
cat crates/vm/src/stdlib/ctypes/function.rs
</function_calls>Perfect! Now I have the full context. Let me verify if CFuncPointer is actually used by looking for instantiations and calls:
<function_calls>
rg -n "CFuncPointer|CFunction" crates/vm/src/ --type=rs -B 1 -A 1
</function_calls>Now let me check what happens when a CFuncPointer is created and used - trace through the module's public API:
<function_calls>
rg -n "pub.*CFuncPointer|pub fn.*c_function" crates/vm/src/stdlib/ctypes/ --type=rs -B 2 -A 10
</function_calls>Let me check if there's code that shows how this gets called:
<function_calls>
rg -n ".call(|invoke|call_with|with_result" crates/vm/src/stdlib/ctypes/ --type=rs -B 3 -A 3 | head -100
</function_calls>Let me look at the mod.rs file more carefully to see the exported types:
<function_calls>
cat crates/vm/src/stdlib/ctypes/mod.rs | head -300
</function_calls>Let me search for how return types are handled in this module:
<function_calls>
rg -n "restype|return_type" crates/vm/src/stdlib/ctypes/ --type=rs -B 2 -A 2
</function_calls>Let me check the git history to understand the intent - is this a placeholder or WIP code:
<function_calls>
git log --oneline crates/vm/src/stdlib/ctypes/function.rs | head -10
</function_calls><function_calls>
git log -p --follow -S "from_ffi_type" crates/vm/src/stdlib/ctypes/function.rs | head -150
</function_calls>Let me check if there are docs or comments about this being incomplete:
<function_calls>
rg -n "TODO|FIXME|WIP|stub|unimplemented|not.*implement" crates/vm/src/stdlib/ctypes/ --type=rs
</function_calls>Now let me verify if the CFuncPointer type is exposed publicly and can actually be called:
<function_calls>
rg -n "pub.*struct CFuncPointer|pub.*CFuncPointer" crates/vm/src/stdlib/ctypes/ --type=rs
</function_calls>Based on my exploration, let me now determine if this is a critical issue by searching for any runtime invocations of this:
<function_calls>
python3 << 'EOF'Check if CFuncPointer is used in the public API
import subprocess
import reSearch for CFuncPointer usage in the codebase
result = subprocess.run(['rg', '-n', 'CFuncPointer', 'crates/', '--type=rs'],
capture_output=True, text=True)lines = result.stdout.strip().split('\n')
print(f"Total CFuncPointer references: {len(lines)}")
print("\nAll references:")
for line in lines[:30]:
print(line)
EOF
</function_calls>Let me verify the actual invocation path by checking if from_ffi_type is called through CDataType:
<function_calls>
grep -n "impl.*CDataType" crates/vm/src/stdlib/ctypes/types.rs | head -5
</function_calls><function_calls>
sed -n '1,100p' crates/vm/src/stdlib/ctypes/types.rs
</function_calls>Let me check if CFunction implements CDataType or similar:
<function_calls>
rg -n "impl.*for.*CFuncPointer|impl.*CFuncPointer" crates/vm/src/stdlib/ctypes/function.rs -B 3 -A 15
</function_calls>Now let me understand the actual situation - let me look at line 76-83 once more and check what trait this implements:
<function_calls>
sed -n '65,90p' crates/vm/src/stdlib/ctypes/function.rs
</function_calls>Now I see it! Line 75 shows "impl CDataType for CFuncPointer". Let me understand the invocation path:
<function_calls>
rg -n "CDataType" crates/vm/src/stdlib/ctypes/ --type=rs -B 1 -A 1 | head -150
</function_calls>Let me check for actual invocations of from_ffi_type through the trait:
<function_calls>
rg -n "dyn.*CDataType|as_data_type|.from_ffi_type" crates/vm/src/stdlib/ctypes/ --type=rs -B 2 -A 2
</function_calls>Let me check types.rs to see where from_ffi_type is called:
<function_calls>
wc -l crates/vm/src/stdlib/ctypes/types.rs && echo "---" && cat crates/vm/src/stdlib/ctypes/types.rs | grep -n "from_ffi_type" -B 5 -A 5
</function_calls>Let me search across the entire codebase for where this method is called:
<function_calls>
rg -n "from_ffi_type(" crates/ --type=rs
</function_calls>Let me verify this is indeed in a struct that's part of the public API:
<function_calls>
sed -n '1,85p' crates/vm/src/stdlib/ctypes/function.rs | tail -40
</function_calls>Let me check if CFuncPointer is used in Python binding code:
<function_calls>
rg -n "CFuncPointer|c_func|CFUNCTYPE" crates/vm/src/stdlib/ctypes/mod.rs --type=rs -B 2 -A 3 | head -100
</function_calls>Let me look at the full mod.rs file to see what's exported:
<function_calls>
wc -l crates/vm/src/stdlib/ctypes/mod.rs && tail -300 crates/vm/src/stdlib/ctypes/mod.rs
</function_calls>Let me check Python test files or documentation for ctypes function pointer usage:
<function_calls>
find . -name "*.py" -type f | xargs grep -l "CFUNCTYPE|c_func|function.*return" 2>/dev/null | head -5
</function_calls><function_calls>
find crates -name ".py" -o -name ".md" | xargs grep -i "function.*pointer|cfunctype|ctypes.*function" 2>/dev/null | head -20
</function_calls>Let me verify if this is truly a critical runtime path by checking how to_ffi_value is used (the inverse operation):
<function_calls>
rg -n "to_ffi_value(" crates/ --type=rs
</function_calls>Let me check if there's an actual implementation of the inverse method to understand the pattern:
<function_calls>
sed -n '65,95p' crates/vm/src/stdlib/ctypes/function.rs
</function_calls>Let me see what to_ffi_value does for CFuncPointer:
<function_calls>
sed -n '60,100p' crates/vm/src/stdlib/ctypes/function.rs
</function_calls>Let me check the exact line numbers again and get the full impl block:
<function_calls>
cat -n crates/vm/src/stdlib/ctypes/function.rs | sed -n '60,95p'
</function_calls>Perfect. Now let me verify what the actual concern is - whether from_ffi_type returning None vs panicking matters:
<function_calls>
rg -n "impl.*CDataType" crates/vm/src/stdlib/ctypes/ --type=rs -A 20 | grep -E "from_ffi_type|to_ffi_value" -B 5 -A 5 | head -100
</function_calls>Let me now verify the core claim: Is this actually a critical path for function call returns? Let me check a simple implementation:
<function_calls>
sed -n '1,50p' crates/vm/src/stdlib/ctypes/types.rs
</function_calls>Let me look at what the return value means - checking a complete CDataType implementation:
<function_calls>
rg -n "from_ffi_type" crates/vm/src/stdlib/ctypes/types.rs -B 10 -A 15 | head -80
</function_calls>Now let me check what Option means and if returning None is appropriate:
<function_calls>
rg -n "fn from_ffi_type" crates/vm/src/stdlib/ctypes/ --type=rs -A 8 | head -60
</function_calls>Let me look at different implementations to understand the return type semantics:
<function_calls>
grep -n "fn from_ffi_type" crates/vm/src/stdlib/ctypes/types.rs
</function_calls><function_calls>
sed -n '240,260p' crates/vm/src/stdlib/ctypes/types.rs
</function_calls>Let me search for the complete pattern of from_ffi_type implementations:
<function_calls>
python3 << 'EOF'
import subprocess
import reFind all from_ffi_type implementations
result = subprocess.run(['rg', '-A', '5', 'fn from_ffi_type', 'crates/vm/src/stdlib/ctypes/', '--type=rs'],
capture_output=True, text=True)print("from_ffi_type implementations:")
print(result.stdout[:2000])
EOF
</function_calls>Let me check if there are tests that would reveal if this is actually called:
<function_calls>
find . -path "/test" -name "*.rs" | xargs grep -l "CFuncPointer|c_function|CFUNCTYPE" 2>/dev/null
</function_calls>Let me check if there's a CI configuration that might show ctypes tests:
<function_calls>
find . -name ".yml" -o -name ".yaml" | xargs grep -l "ctypes" 2>/dev/null | head -3
</function_calls>Let me summarize my findings in a verification script:
<function_calls>
python3 << 'EOF'
import subprocessfindings = []
Check 1: Is from_ffi_type called?
result = subprocess.run(['rg', 'from_ffi_type\(', 'crates/', '--type=rs'],
capture_output=True, text=True)
if result.stdout:
findings.append("✓ from_ffi_type IS called somewhere")
findings.append(f" Locations: {result.stdout[:200]}")
else:
findings.append("✗ from_ffi_type is NOT called anywhere in the codebase")Check 2: Is CFuncPointer exported?
result = subprocess.run(['rg', 'pub.*CFuncPointer', 'crates/', '--type=rs'],
capture_output=True, text=True)
if result.stdout:
findings.append("✓ CFuncPointer is exported publicly")
findings.append(f" {result.stdout[:200]}")
else:
findings.append("✗ CFuncPointer is NOT publicly exported")Check 3: Check line 81 specifically
result = subprocess.run(['sed', '-n', '76,83p', 'crates/vm/src/stdlib/ctypes/function.rs'],
capture_output=True, text=True)
findings.append(f"\nCode at lines 76-83:\n{result.stdout}")for f in findings:
print(f)
EOF
</function_calls>Based on my comprehensive verification, I can now provide the rewritten review comment:
from_ffi_typeimplementation is incomplete but not currently invoked.The
todo!()on line 81 is problematic only if the method is called. However, my search of the codebase reveals thatfrom_ffi_typeis never invoked anywhere, andCFuncPointeris not publicly exported from the ctypes module. This appears to be a stub implementation for a feature (function pointers as return types from C functions) that is not yet implemented or used.If function pointer return types are intended to be supported in the future, the suggested fix is reasonable:
fn from_ffi_type( &self, _value: *mut ffi::c_void, - _vm: &VirtualMachine, + vm: &VirtualMachine, ) -> PyResult<Option<PyObjectRef>> { - todo!() + Err(vm.new_not_implemented_error("Return type conversion not yet implemented".to_string())) }However, if this feature is not planned, the stub can remain as-is or be removed entirely.
257-259: Based on my verification through web searches, I can now provide a rewritten review comment. The searches confirm that:
- Function pointers ARE Send + Sync by default in Rust, but FFI wrappers require careful thread-safety audits
- The TODO comment is a red flag indicating this was deliberately left as a known issue
- Best practices require either a proper
SAFETYcomment or synchronization guarantees- Cannot definitively verify the specific implementation without direct code access
Replace the
TODO: fixcomment with a properSAFETYjustification or add synchronization.The
TODO: fixon these unsafe trait implementations indicates a known concern. While raw Rust function pointers (fn) are inherentlySend + Sync, wrapping them in a#[pyclass]context requires verifying that:
CodePtr(function pointer) is only read, never modified post-construction, and any referenced C code is thread-safehandler: PyObjectRefis either wrapped inPyRwLock<_>or protected by equivalent synchronization (PyO3Py<T>isSend + Sync, but interior mutability requires guards)- No GIL-only state is accessed by the wrapped function
If these hold, document the safety invariants in a
SAFETYcomment. If uncertain, wrap mutable or GIL-bound fields withMutexor confine the type to single-threaded contexts using#[pyclass(unsendable)].
| return if std::ptr::eq(ty.as_raw_ptr(), libffi::middle::Type::u8().as_raw_ptr()) { | ||
| i.to_u8().map(|r: u8| libffi::middle::Arg::new(&r)) | ||
| } else if std::ptr::eq(ty.as_raw_ptr(), libffi::middle::Type::i8().as_raw_ptr()) { | ||
| return i.to_i8().map(|r: i8| libffi::middle::Arg::new(&r)); | ||
| i.to_i8().map(|r: i8| libffi::middle::Arg::new(&r)) | ||
| } else if std::ptr::eq(ty.as_raw_ptr(), libffi::middle::Type::u16().as_raw_ptr()) { | ||
| return i.to_u16().map(|r: u16| libffi::middle::Arg::new(&r)); | ||
| i.to_u16().map(|r: u16| libffi::middle::Arg::new(&r)) | ||
| } else if std::ptr::eq(ty.as_raw_ptr(), libffi::middle::Type::i16().as_raw_ptr()) { | ||
| return i.to_i16().map(|r: i16| libffi::middle::Arg::new(&r)); | ||
| i.to_i16().map(|r: i16| libffi::middle::Arg::new(&r)) | ||
| } else if std::ptr::eq(ty.as_raw_ptr(), libffi::middle::Type::u32().as_raw_ptr()) { | ||
| return i.to_u32().map(|r: u32| libffi::middle::Arg::new(&r)); | ||
| i.to_u32().map(|r: u32| libffi::middle::Arg::new(&r)) | ||
| } else if std::ptr::eq(ty.as_raw_ptr(), libffi::middle::Type::i32().as_raw_ptr()) { | ||
| return i.to_i32().map(|r: i32| libffi::middle::Arg::new(&r)); | ||
| i.to_i32().map(|r: i32| libffi::middle::Arg::new(&r)) | ||
| } else if std::ptr::eq(ty.as_raw_ptr(), libffi::middle::Type::u64().as_raw_ptr()) { | ||
| return i.to_u64().map(|r: u64| libffi::middle::Arg::new(&r)); | ||
| i.to_u64().map(|r: u64| libffi::middle::Arg::new(&r)) | ||
| } else if std::ptr::eq(ty.as_raw_ptr(), libffi::middle::Type::i64().as_raw_ptr()) { | ||
| return i.to_i64().map(|r: i64| libffi::middle::Arg::new(&r)); | ||
| i.to_i64().map(|r: i64| libffi::middle::Arg::new(&r)) | ||
| } else { | ||
| return None; | ||
| } | ||
| None | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use-after-free: Arg::new references stack-local values that are immediately dropped.
libffi::middle::Arg::new(&r) captures a reference to r, but r is a local variable created within the closure passed to map. Once the closure returns, r is dropped, leaving a dangling reference inside the Arg. This is undefined behavior when the Arg is later used in an FFI call.
You need to store the converted values in a longer-lived location. Consider storing the value in a field or using a boxed/heap-allocated approach:
pub fn to_arg(
&self,
ty: libffi::middle::Type,
vm: &VirtualMachine,
- ) -> Option<libffi::middle::Arg> {
+ ) -> Option<(Box<dyn std::any::Any>, libffi::middle::Arg)> {
let value = unsafe { (*self.value.as_ptr()).clone() };
if let Ok(i) = value.try_int(vm) {
let i = i.as_bigint();
- return if std::ptr::eq(ty.as_raw_ptr(), libffi::middle::Type::u8().as_raw_ptr()) {
- i.to_u8().map(|r: u8| libffi::middle::Arg::new(&r))
+ if std::ptr::eq(ty.as_raw_ptr(), libffi::middle::Type::u8().as_raw_ptr()) {
+ return i.to_u8().map(|r| {
+ let boxed: Box<u8> = Box::new(r);
+ let arg = libffi::middle::Arg::new(boxed.as_ref());
+ (boxed as Box<dyn std::any::Any>, arg)
+ });
+ }
// ... similar pattern for other typesAlternatively, refactor to return an owned representation that the caller can hold until the FFI call completes.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
crates/vm/src/stdlib/ctypes/base.rs around lines 280-298: the current code calls
libffi::middle::Arg::new(&r) with r as a stack-local created inside the map
closure, producing a dangling reference after the closure returns; replace this
by producing owned, longer-lived storage for the converted value (for example
allocate the converted primitive on the heap with Box::new(...) or push it into
a Vec owned by the returned structure) and create the Arg from a pointer to that
owned storage, or refactor the function to return an owned wrapper (e.g., a
struct holding the boxed values plus the libffi::Arg pointers) so the converted
values outlive the Arg until the FFI call completes.
| impl Representable for PyCField { | ||
| fn repr_str(zelf: &Py<Self>, _vm: &VirtualMachine) -> PyResult<String> { | ||
| let tp_name = zelf.proto.name().to_string(); | ||
| if zelf.bitfield_size != false { | ||
| Ok(format!( | ||
| "<{} type={}, ofs={byte_offset}, bit_size={bitfield_size}, bit_offset={bit_offset}", | ||
| zelf.name, | ||
| tp_name, | ||
| byte_offset = zelf.byte_offset, | ||
| bitfield_size = zelf.bitfield_size, | ||
| bit_offset = zelf.bit_offset | ||
| )) | ||
| } else { | ||
| Ok(format!( | ||
| "<{} type={tp_name}, ofs={}, size={}", | ||
| zelf.name, zelf.byte_offset, zelf.byte_size | ||
| )) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix format strings and use idiomatic boolean check.
- The condition
zelf.bitfield_size != falseshould use idiomaticzelf.bitfield_size. - Both format strings are missing the closing
>character.
Apply this diff:
impl Representable for PyCField {
fn repr_str(zelf: &Py<Self>, _vm: &VirtualMachine) -> PyResult<String> {
let tp_name = zelf.proto.name().to_string();
- if zelf.bitfield_size != false {
+ if zelf.bitfield_size {
Ok(format!(
- "<{} type={}, ofs={byte_offset}, bit_size={bitfield_size}, bit_offset={bit_offset}",
+ "<{} type={}, ofs={byte_offset}, bit_size={bitfield_size}, bit_offset={bit_offset}>",
zelf.name,
tp_name,
byte_offset = zelf.byte_offset,
bitfield_size = zelf.bitfield_size,
bit_offset = zelf.bit_offset
))
} else {
Ok(format!(
- "<{} type={tp_name}, ofs={}, size={}",
+ "<{} type={tp_name}, ofs={}, size={}>",
zelf.name, zelf.byte_offset, zelf.byte_size
))
}
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| impl Representable for PyCField { | |
| fn repr_str(zelf: &Py<Self>, _vm: &VirtualMachine) -> PyResult<String> { | |
| let tp_name = zelf.proto.name().to_string(); | |
| if zelf.bitfield_size != false { | |
| Ok(format!( | |
| "<{} type={}, ofs={byte_offset}, bit_size={bitfield_size}, bit_offset={bit_offset}", | |
| zelf.name, | |
| tp_name, | |
| byte_offset = zelf.byte_offset, | |
| bitfield_size = zelf.bitfield_size, | |
| bit_offset = zelf.bit_offset | |
| )) | |
| } else { | |
| Ok(format!( | |
| "<{} type={tp_name}, ofs={}, size={}", | |
| zelf.name, zelf.byte_offset, zelf.byte_size | |
| )) | |
| } | |
| } | |
| } | |
| impl Representable for PyCField { | |
| fn repr_str(zelf: &Py<Self>, _vm: &VirtualMachine) -> PyResult<String> { | |
| let tp_name = zelf.proto.name().to_string(); | |
| if zelf.bitfield_size { | |
| Ok(format!( | |
| "<{} type={}, ofs={byte_offset}, bit_size={bitfield_size}, bit_offset={bit_offset}>", | |
| zelf.name, | |
| tp_name, | |
| byte_offset = zelf.byte_offset, | |
| bitfield_size = zelf.bitfield_size, | |
| bit_offset = zelf.bit_offset | |
| )) | |
| } else { | |
| Ok(format!( | |
| "<{} type={tp_name}, ofs={}, size={}>", | |
| zelf.name, zelf.byte_offset, zelf.byte_size | |
| )) | |
| } | |
| } | |
| } |
🤖 Prompt for AI Agents
In crates/vm/src/stdlib/ctypes/field.rs around lines 37–56, the boolean check
and two repr format strings are incorrect: change the condition from
`zelf.bitfield_size != false` to the idiomatic `zelf.bitfield_size` (or `if
zelf.bitfield_size {`), and add the missing trailing `>` character to both
format! string literals so each returned representation is properly closed; keep
the same placeholders and values when editing the format! calls.
| #[pygetset(name = "argtypes", setter)] | ||
| fn set_argtypes(&self, argtypes: PyObjectRef, vm: &VirtualMachine) -> PyResult<()> { | ||
| let none = vm.is_none(&argtypes); | ||
| if none { | ||
| *self.arg_types.write() = None; | ||
| Ok(()) | ||
| } else { | ||
| let tuple = argtypes.downcast::<PyTuple>().unwrap(); | ||
| *self.arg_types.write() = Some( | ||
| tuple | ||
| .iter() | ||
| .map(|obj| obj.clone().downcast::<PyType>().unwrap()) | ||
| .collect::<Vec<_>>(), | ||
| ); | ||
| Ok(()) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Panics possible in set_argtypes due to unchecked unwrap() calls.
Lines 301 and 305 use .unwrap() on downcast() which will panic if the input contains non-PyTuple or non-PyType objects. This should return a proper Python error.
Apply this diff:
#[pygetset(name = "argtypes", setter)]
fn set_argtypes(&self, argtypes: PyObjectRef, vm: &VirtualMachine) -> PyResult<()> {
let none = vm.is_none(&argtypes);
if none {
*self.arg_types.write() = None;
Ok(())
} else {
- let tuple = argtypes.downcast::<PyTuple>().unwrap();
+ let tuple = argtypes.downcast::<PyTuple>()
+ .map_err(|_| vm.new_type_error("argtypes must be a sequence of types".to_string()))?;
- *self.arg_types.write() = Some(
- tuple
- .iter()
- .map(|obj| obj.clone().downcast::<PyType>().unwrap())
- .collect::<Vec<_>>(),
- );
+ let types = tuple
+ .iter()
+ .map(|obj| {
+ obj.clone()
+ .downcast::<PyType>()
+ .map_err(|_| vm.new_type_error("argtypes must contain types".to_string()))
+ })
+ .collect::<PyResult<Vec<_>>>()?;
+ *self.arg_types.write() = Some(types);
Ok(())
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #[pygetset(name = "argtypes", setter)] | |
| fn set_argtypes(&self, argtypes: PyObjectRef, vm: &VirtualMachine) -> PyResult<()> { | |
| let none = vm.is_none(&argtypes); | |
| if none { | |
| *self.arg_types.write() = None; | |
| Ok(()) | |
| } else { | |
| let tuple = argtypes.downcast::<PyTuple>().unwrap(); | |
| *self.arg_types.write() = Some( | |
| tuple | |
| .iter() | |
| .map(|obj| obj.clone().downcast::<PyType>().unwrap()) | |
| .collect::<Vec<_>>(), | |
| ); | |
| Ok(()) | |
| } | |
| } | |
| #[pygetset(name = "argtypes", setter)] | |
| fn set_argtypes(&self, argtypes: PyObjectRef, vm: &VirtualMachine) -> PyResult<()> { | |
| let none = vm.is_none(&argtypes); | |
| if none { | |
| *self.arg_types.write() = None; | |
| Ok(()) | |
| } else { | |
| let tuple = argtypes.downcast::<PyTuple>() | |
| .map_err(|_| vm.new_type_error("argtypes must be a sequence of types".to_string()))?; | |
| let types = tuple | |
| .iter() | |
| .map(|obj| { | |
| obj.clone() | |
| .downcast::<PyType>() | |
| .map_err(|_| vm.new_type_error("argtypes must contain types".to_string())) | |
| }) | |
| .collect::<PyResult<Vec<_>>>()?; | |
| *self.arg_types.write() = Some(types); | |
| Ok(()) | |
| } | |
| } |
| if _os.name == "posix": | ||
| if _sys.platform == "darwin": | ||
| libc = cdll.LoadLibrary("libc.dylib") | ||
| libc.rand() | ||
| i = c_int(1) | ||
| # print("start srand") | ||
| # print(libc.srand(i)) | ||
| # print(test_byte_array) | ||
| else: | ||
| import os | ||
|
|
||
| libc = cdll.msvcrt | ||
| libc.rand() | ||
| i = c_int(1) | ||
| print("start srand") | ||
| print(libc.srand(i)) | ||
| print(test_byte_array) | ||
| print(test_byte_array._type_) | ||
| # print(libc.srand(i)) | ||
| # print(test_byte_array) | ||
| # print(test_byte_array._type_) | ||
| # print("start printf") | ||
| # libc.printf(test_byte_array) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-Darwin POSIX platforms (Linux, etc.) have no libc test coverage.
The restructured conditionals mean that when _os.name == "posix" but _sys.platform != "darwin", no code executes for libc testing. The else at line 344 only triggers for non-POSIX systems (Windows).
This appears to drop test coverage for Linux and other POSIX platforms. If intentional, consider adding a comment explaining why; if not, the logic should be adjusted:
if _os.name == "posix":
if _sys.platform == "darwin":
libc = cdll.LoadLibrary("libc.dylib")
libc.rand()
i = c_int(1)
# print("start srand")
# print(libc.srand(i))
# print(test_byte_array)
+ else:
+ # Linux and other POSIX platforms
+ libc = cdll.LoadLibrary("libc.so.6")
+ libc.rand()
+ i = c_int(1)
else:
import os🤖 Prompt for AI Agents
In extra_tests/snippets/stdlib_ctypes.py around lines 336-356, the current
if/else structure causes the POSIX branch to only run for Darwin and the else
branch to run only on non-POSIX (Windows), leaving non-Darwin POSIX systems
(e.g., Linux) untested; change the branching so POSIX platforms always exercise
libc testing: keep the _os.name == "posix" block but add an inner branch for
_sys.platform == "darwin" and a separate branch for other POSIX platforms that
loads the appropriate libc (e.g., libc.so.6 or using cdll.LoadLibrary(None)) and
runs the same rand/srand checks, while keeping the Windows-specific branch as an
explicit elif _os.name != "posix" (or elif _os.name == "nt") so Linux/other
POSIX are covered.
4bdba7b to
9ab5e4a
Compare
c6f22a4 to
8af105f
Compare
youknowone
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rebased it.
Revert point for next task is: 8af105f
Summary by CodeRabbit
New Features
Bug Fixes
Chores
✏️ Tip: You can customize this high-level summary in your review settings.