Skip to content

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Nov 29, 2025

Summary by CodeRabbit

  • New Features

    • ctypes now use a CData-backed memory model with buffer-protocol support for simple types, arrays, structures, unions and pointers; exposes underlying objects and mutable buffers.
    • New classmethods: construct from addresses, buffers, buffer copies, and DLL symbols; pointer casting and richer size/alignment introspection.
    • C-callable thunks: wrap Python callables as C-callable callbacks.
  • Bug Fixes

    • Stronger validation and clearer errors for NULL pointers, offsets, bounds and type conversions.
  • Documentation

    • Updated example feature flag in docs.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 29, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

Adds StgInfo/CDataObject-backed storage across ctypes types, migrates instance buffers to CDataObject with AsBuffer support and address/buffer constructors, implements bytes↔Python serialization, alignment/size/cast helpers, updates pointers/arrays/structures/unions to use stg_info/cdata, and implements libffi-backed C-callable thunks.

Changes

Cohort / File(s) Summary
Core / CData & util
crates/vm/src/stdlib/ctypes/base.rs, crates/vm/src/stdlib/ctypes/util.rs
Add StgInfo and CDataObject; introduce PyCData, PyCSimpleType, PyCSimple cdata backing, value↔bytes serialization, buffer methods and AsBuffer support; constructors from_stg_info/from_bytes/from_base and size() accessor.
Module & public API
crates/vm/src/stdlib/ctypes.rs
Re-export ctypes base types; add get_align, get_size_from_type, bytes_to_pyobject; refactor size_of, alignment, and cast paths to use StgInfo/CDataObject, and initialize PyCSimple cdata in new_simple_type.
Arrays
crates/vm/src/stdlib/ctypes/array.rs
Replace raw buffer with cdata; store element typ as PyObjectRef; add AsBuffer (ARRAY_BUFFER_METHODS), buffer-backed accessors, to_arg via cdata, and classmethods from_address/from_buffer/from_buffer_copy/in_dll.
Structures
crates/vm/src/stdlib/ctypes/structure.rs
Add stg_info to struct types; replace instance buffer with cdata; compute layout/alignment with helpers; add AsBuffer (STRUCTURE_BUFFER_METHODS), _objects getter, and address/buffer classmethods.
Unions
crates/vm/src/stdlib/ctypes/union.rs
Add stg_info to union types; compute union layout; back instances with cdata; implement AsBuffer (UNION_BUFFER_METHODS), _objects, and address/buffer classmethods.
Pointers & Field descriptors
crates/vm/src/stdlib/ctypes/pointer.rs, crates/vm/src/stdlib/ctypes/field.rs
Pointer types gain stg_info; PyCPointer adds Constructor/__init__ and classmethods (from_address, from_buffer, from_buffer_copy, in_dll); array/pointer construction uses typ: PyObjectRef; field descriptors read/write from cdata.buffer for Structure and Union.
Function callbacks / Thunks
crates/vm/src/stdlib/ctypes/thunk.rs, crates/vm/src/stdlib/ctypes/function.rs
Implement PyCThunk using libffi closures and leaked userdata with Drop cleanup; add ffi↔Python conversion helpers and thunk callback; allow Python callables to create C-callable thunks in PyCFuncPtr and store thunk lifecycle via PyRef<PyCThunk>.
Public surface / Buffer protocol (multiple)
crates/vm/src/stdlib/ctypes/*.rs
Add AsBuffer implementations and BufferMethods for PyCSimple, PyCArray, PyCStructure, PyCUnion; expose from_address/from_buffer/from_buffer_copy broadly; move storage/read/write to cdata.buffer and propagate StgInfo for sizing/alignment.
Docs / misc
.github/copilot-instructions.md, .cspell.dict/cpython.txt
Update example feature flag (ssl→jit) and add swappedbytes to spell-check dictionary.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant C as C caller
participant Cl as libffi Closure
participant VM as RustPython VM (thunk_callback)
participant Py as Python callable
Note right of Cl: Closure holds CodePtr + leaked userdata
C->>Cl: call CodePtr with raw args
Cl->>VM: thunk_callback invoked (raw pointers)
VM->>VM: convert ffi args -> PyObjects (ffi_to_python)
VM->>Py: call Python callable with converted args
Py-->>VM: return PyObject result
VM->>VM: convert PyObject -> ffi return area (python_to_ffi)
VM->>Cl: write return value into closure return area
Cl-->>C: return to C caller

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

  • Pay special attention to:
    • CDataObject ownership/aliasing and from_base lifetime semantics
    • Concurrency and locking around PyRwLock<CDataObject>
    • libffi thunk construction, leaked userdata lifecycle, Drop cleanup, and Send/Sync claims
    • Size/alignment calculations (StgInfo) and field layout consistency across arrays/structs/unions/pointers
    • Validation and error handling in from_buffer/from_address/from_buffer_copy and in_dll symbol resolution

Possibly related PRs

Suggested reviewers

  • arihant2math

Poem

🐇 I burrowed bytes in tidy rows,
CData snug where memory flows.
Thunks leap out when C calls near,
Buffers hum and fields appear.
A rabbit smiles — bytes safe and clear!

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'ctypes buffer' is vague and does not clearly convey the scope of changes, which include significant structural refactoring of ctypes types, buffer management, and new public APIs across multiple modules. Consider using a more specific title that captures the main architectural change, such as 'Refactor ctypes to use CDataObject-backed buffer storage' or 'Add buffer protocol support to ctypes types'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 81.31% which is sufficient. The required threshold is 80.00%.

📜 Recent review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 685690f and 36b6d8f.

📒 Files selected for processing (10)
  • .cspell.dict/cpython.txt (1 hunks)
  • crates/vm/src/stdlib/ctypes.rs (7 hunks)
  • crates/vm/src/stdlib/ctypes/array.rs (13 hunks)
  • crates/vm/src/stdlib/ctypes/base.rs (10 hunks)
  • crates/vm/src/stdlib/ctypes/function.rs (5 hunks)
  • crates/vm/src/stdlib/ctypes/pointer.rs (3 hunks)
  • crates/vm/src/stdlib/ctypes/structure.rs (9 hunks)
  • crates/vm/src/stdlib/ctypes/thunk.rs (1 hunks)
  • crates/vm/src/stdlib/ctypes/union.rs (3 hunks)
  • crates/vm/src/stdlib/ctypes/util.rs (1 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@youknowone youknowone force-pushed the ctypes-buffer branch 3 times, most recently from bff3045 to e71e7a8 Compare November 29, 2025 05:33
@youknowone youknowone marked this pull request as ready for review November 29, 2025 05:33
@github-actions
Copy link
Contributor

Code has been automatically formatted

The code in this PR has been formatted using cargo fmt --all.
Please pull the latest changes before pushing again:

git pull origin ctypes-buffer

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/vm/src/stdlib/ctypes/base.rs (1)

45-53: Duplicate condition in set_primitive for type "c".

Lines 46-49 and 50-53 have identical conditions checking downcast_exact::<PyBytes> with v.len() == 1. This appears to be a copy-paste error.

         if value
             .clone()
             .downcast_exact::<PyBytes>(vm)
             .is_ok_and(|v| v.len() == 1)
-            || value
-                .clone()
-                .downcast_exact::<PyBytes>(vm)
-                .is_ok_and(|v| v.len() == 1)
             || value
                 .clone()
                 .downcast_exact::<PyInt>(vm)
🧹 Nitpick comments (7)
crates/vm/src/stdlib/ctypes/field.rs (2)

106-125: Consider extracting duplicated buffer access logic.

The structure and union branches have identical logic for reading from the cdata buffer. This could be refactored to reduce duplication, perhaps by extracting a helper that takes the buffer reference.

-        // Instance attribute access - read value from the structure/union's buffer
-        if let Some(structure) = obj.downcast_ref::<PyCStructure>() {
-            let cdata = structure.cdata.read();
-            let offset = zelf.byte_offset;
-            let size = zelf.byte_size;
-
-            if offset + size <= cdata.buffer.len() {
-                let bytes = &cdata.buffer[offset..offset + size];
-                return PyCField::bytes_to_value(bytes, size, vm);
-            }
-        } else if let Some(union) = obj.downcast_ref::<PyCUnion>() {
-            let cdata = union.cdata.read();
-            let offset = zelf.byte_offset;
-            let size = zelf.byte_size;
-
-            if offset + size <= cdata.buffer.len() {
-                let bytes = &cdata.buffer[offset..offset + size];
-                return PyCField::bytes_to_value(bytes, size, vm);
-            }
-        }
+        // Instance attribute access - read value from the structure/union's buffer
+        let cdata_guard = if let Some(structure) = obj.downcast_ref::<PyCStructure>() {
+            Some(structure.cdata.read())
+        } else if let Some(union) = obj.downcast_ref::<PyCUnion>() {
+            Some(union.cdata.read())
+        } else {
+            None
+        };
+
+        if let Some(cdata) = cdata_guard {
+            let offset = zelf.byte_offset;
+            let size = zelf.byte_size;
+            if offset + size <= cdata.buffer.len() {
+                let bytes = &cdata.buffer[offset..offset + size];
+                return PyCField::bytes_to_value(bytes, size, vm);
+            }
+        }

200-240: Duplicated write logic could be consolidated.

Similar to descr_get, the write logic is duplicated between structure and union branches. The differentiated delete error messages are appropriate, but the assign paths are identical.

crates/vm/src/stdlib/ctypes/structure.rs (1)

318-324: Add safety documentation for unsafe block.

The unsafe block reads from a raw pointer without any validation that the memory is valid or properly sized. While this is inherent to FFI, a safety comment documenting the requirements would be helpful.

         // Read data from address
         if address == 0 || size == 0 {
             return Err(vm.new_value_error("NULL pointer access".to_owned()));
         }
+        // SAFETY: Caller must ensure `address` points to valid memory of at least
+        // `size` bytes that remains valid for the duration of this read.
         let data = unsafe {
             let ptr = address as *const u8;
             std::slice::from_raw_parts(ptr, size).to_vec()
         };
crates/vm/src/stdlib/ctypes/union.rs (1)

87-111: Duplicated get_field_size implementation.

This method is identical to PyCStructType::get_field_size in structure.rs. Consider extracting to a shared helper function to avoid code duplication.

crates/vm/src/stdlib/ctypes/base.rs (2)

563-591: Verify the safety documentation for from_address.

The unsafe block reads from an arbitrary memory address. While this matches CPython's behavior, consider adding a # Safety doc comment explaining the preconditions (valid readable address, correct size) for maintainability.


641-641: Address the TODO: Missing reference tracking could cause use-after-free.

The from_buffer method creates a ctypes instance backed by the source buffer but doesn't store a reference to keep the buffer alive. If the source is garbage-collected while the ctypes instance is still in use, it could lead to undefined behavior.

Would you like me to open an issue to track implementing proper reference tracking via _objects?

crates/vm/src/stdlib/ctypes/array.rs (1)

486-497: Consider validating _type_ attribute exists before downcast.

The _type_ attribute retrieval could fail silently if the class doesn't have this attribute, but the subsequent downcast provides an error. The error message could be more specific.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e5aec9d and cfa2fea.

⛔ Files ignored due to path filters (1)
  • Lib/ctypes/wintypes.py is excluded by !Lib/**
📒 Files selected for processing (7)
  • crates/vm/src/stdlib/ctypes.rs (3 hunks)
  • crates/vm/src/stdlib/ctypes/array.rs (12 hunks)
  • crates/vm/src/stdlib/ctypes/base.rs (9 hunks)
  • crates/vm/src/stdlib/ctypes/field.rs (3 hunks)
  • crates/vm/src/stdlib/ctypes/pointer.rs (3 hunks)
  • crates/vm/src/stdlib/ctypes/structure.rs (5 hunks)
  • crates/vm/src/stdlib/ctypes/union.rs (2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style in Rust code (cargo fmt to format)
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by your changes
Follow Rust best practices for error handling and memory management

Files:

  • crates/vm/src/stdlib/ctypes/field.rs
  • crates/vm/src/stdlib/ctypes.rs
  • crates/vm/src/stdlib/ctypes/pointer.rs
  • crates/vm/src/stdlib/ctypes/base.rs
  • crates/vm/src/stdlib/ctypes/structure.rs
  • crates/vm/src/stdlib/ctypes/union.rs
  • crates/vm/src/stdlib/ctypes/array.rs
**/src/**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/vm/src/stdlib/ctypes/field.rs
  • crates/vm/src/stdlib/ctypes.rs
  • crates/vm/src/stdlib/ctypes/pointer.rs
  • crates/vm/src/stdlib/ctypes/base.rs
  • crates/vm/src/stdlib/ctypes/structure.rs
  • crates/vm/src/stdlib/ctypes/union.rs
  • crates/vm/src/stdlib/ctypes/array.rs
🧠 Learnings (3)
📚 Learning: 2025-11-10T06:27:41.954Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6243
File: vm/src/function/buffer.rs:123-129
Timestamp: 2025-11-10T06:27:41.954Z
Learning: In vm/src/function/buffer.rs, line 36 in the try_rw_bytes_like method intentionally uses TypeError (not BufferError) for the "buffer is not a read-write bytes-like object" error case, even though a similar error message in ArgMemoryBuffer::try_from_borrowed_object uses BufferError. The TypeError is the intended exception type for this specific code path.

Applied to files:

  • crates/vm/src/stdlib/ctypes/base.rs
  • crates/vm/src/stdlib/ctypes/array.rs
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In the RustPython codebase, only certain builtin types should be marked with the SEQUENCE flag for pattern matching. List and tuple are sequences, but bytes, bytearray, and range are not considered sequences in this context, even though they may implement sequence-like protocols.

Applied to files:

  • crates/vm/src/stdlib/ctypes/array.rs
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In RustPython's pattern matching implementation, only certain builtin types should have the SEQUENCE flag: list and tuple are confirmed sequences. The user youknowone indicated that bytes, bytearray are not considered sequences in this context, even though they implement sequence-like protocols.

Applied to files:

  • crates/vm/src/stdlib/ctypes/array.rs
🧬 Code graph analysis (4)
crates/vm/src/stdlib/ctypes.rs (1)
crates/vm/src/stdlib/ctypes/base.rs (3)
  • size (178-180)
  • new (161-167)
  • new (205-211)
crates/vm/src/stdlib/ctypes/pointer.rs (4)
crates/vm/src/stdlib/ctypes/base.rs (8)
  • new (161-167)
  • new (205-211)
  • from_address (564-591)
  • from_buffer (594-643)
  • size (178-180)
  • buffer (732-732)
  • buffer (742-742)
  • from_buffer_copy (646-686)
crates/vm/src/stdlib/ctypes/array.rs (5)
  • from_address (404-440)
  • from_buffer (443-506)
  • buffer (579-579)
  • buffer (589-589)
  • from_buffer_copy (509-564)
crates/vm/src/stdlib/ctypes/structure.rs (5)
  • from_address (310-333)
  • from_buffer (336-385)
  • buffer (436-436)
  • buffer (446-446)
  • from_buffer_copy (388-429)
crates/vm/src/stdlib/ctypes/union.rs (5)
  • from_address (136-153)
  • from_buffer (156-199)
  • buffer (243-243)
  • buffer (250-250)
  • from_buffer_copy (202-237)
crates/vm/src/stdlib/ctypes/structure.rs (1)
crates/vm/src/stdlib/ctypes/base.rs (8)
  • new (161-167)
  • new (205-211)
  • std (544-544)
  • std (547-547)
  • std (550-550)
  • size (178-180)
  • buffer (732-732)
  • buffer (742-742)
crates/vm/src/stdlib/ctypes/union.rs (2)
crates/vm/src/stdlib/ctypes.rs (2)
  • get_size (140-159)
  • size_of (361-374)
crates/vm/src/stdlib/ctypes/pointer.rs (6)
  • std (29-29)
  • std (115-115)
  • std (156-156)
  • from_address (88-98)
  • from_buffer (101-142)
  • from_buffer_copy (145-178)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Run tests under miri
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run rust tests (macos-latest)
🔇 Additional comments (21)
crates/vm/src/stdlib/ctypes/field.rs (1)

8-8: LGTM!

Import for PyCUnion correctly added to support the new union field access.

crates/vm/src/stdlib/ctypes.rs (3)

48-48: LGTM!

Import correctly updated to include CDataObject for the new buffer-backed storage model.


161-175: Dead code annotation - verify usage or remove.

The #[allow(dead_code)] attribute suggests this function isn't currently used. If it's intended for future use, consider adding a TODO comment explaining the planned usage. Otherwise, remove it to keep the codebase clean.


345-350: LGTM!

The PyCSimple initialization correctly creates a CDataObject with the appropriate size for the simple type, enabling buffer-backed storage.

crates/vm/src/stdlib/ctypes/pointer.rs (5)

9-11: LGTM!

Imports correctly added for the new buffer-backed storage model.


30-39: LGTM!

The array type creation correctly uses the new storage model with object-wrapped type references and zero-sized CDataObject for type-level metadata.


87-98: LGTM!

The from_address implementation correctly validates against NULL pointer access and stores the address value.


100-142: LGTM!

The from_buffer implementation correctly validates offset, buffer writability, and bounds before reading the pointer value.


144-178: LGTM!

The from_buffer_copy implementation correctly handles offset validation and bounds checking. Unlike from_buffer, it appropriately doesn't require a writable source since it's copying data.

crates/vm/src/stdlib/ctypes/structure.rs (4)

1-9: LGTM!

Imports correctly updated for the new buffer-backed storage model with CDataObject, buffer protocol types, and AsBuffer trait.


335-385: LGTM!

The from_buffer implementation correctly validates offset, buffer writability, and performs proper bounds checking before copying data.


387-429: LGTM!

The from_buffer_copy implementation correctly handles offset validation and bounds checking for copying data from a bytes-like source.


432-466: LGTM!

The buffer methods and AsBuffer implementation correctly expose the underlying cdata buffer with proper locking semantics. The readonly=false setting is appropriate for ctypes which allows mutation.

crates/vm/src/stdlib/ctypes/union.rs (4)

1-11: LGTM!

Imports correctly updated for the new buffer-backed union implementation with CDataObject, buffer protocol types, and AsBuffer trait.


155-199: LGTM!

The from_buffer implementation correctly validates offset, buffer writability, performs bounds checking, and copies the data.


201-238: LGTM!

The from_buffer_copy implementation correctly handles offset validation, bounds checking, and data copying.


240-268: LGTM!

The buffer methods and AsBuffer implementation correctly expose the union's cdata buffer with proper locking semantics.

crates/vm/src/stdlib/ctypes/base.rs (2)

148-181: LGTM!

The CDataObject struct provides a clean abstraction for ctypes memory-backed storage with appropriate constructor helpers.


728-762: LGTM!

The AsBuffer implementation correctly maps the PyRwLock guards to buffer access methods, enabling proper buffer protocol support for PyCSimple.

crates/vm/src/stdlib/ctypes/array.rs (2)

403-440: Inconsistent error handling: from_address returns error for NULL but from_buffer/from_buffer_copy would proceed.

from_address properly validates address == 0 || size == 0 and returns an error. However, this is the only classmethod with explicit NULL protection. The validation is good here.


575-609: LGTM!

The AsBuffer implementation for PyCArray follows the same correct pattern as PyCSimple, properly mapping lock guards to buffer access.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
crates/vm/src/stdlib/ctypes/array.rs (1)

36-67: CDataObject integration for arrays and the to_arg fix look correct

  • Type and instance ctors now consistently wrap the underlying Vec in CDataObject, and PyCArrayType / PyCArray store the element type as PyObjectRef, which plays well with nested array types.
  • to_arg now takes &cdata.buffer instead of &cdata.buffer.clone(), fixing the previous dangling-pointer issue by ensuring the libffi Arg points into the live backing buffer.
  • ARRAY_BUFFER_METHODS and AsBuffer for PyCArray mirror the patterns used for simple types and structures, providing a writable view over cdata.buffer.

These changes align the array implementation with the new shared CData model and make the FFI path safer.

Also applies to: 211-219, 570-575, 578-612

crates/vm/src/stdlib/ctypes/base.rs (1)

14-15: Fix platform‑dependent size mismatches for c_wchar/c_long/c_ulong in value_to_bytes

The new value_to_bytes drives the backing CDataObject size for simple types, so size mistakes here now directly affect .cdata.buffer and the buffer protocol:

  • "u" (c_wchar) always emits 4 bytes, while _ctypes::get_size("u") and _ctypes::bytes_to_pyobject use the native WideChar width. On Windows (2‑byte wchar_t) this will make c_wchar instances expose a 4‑byte buffer even though sizeof(c_wchar) is 2.
  • "l"/"L" still hardcode 4‑byte layouts, the same issue previously raised for 64‑bit platforms where c_long/c_ulong are 8 bytes. With the new buffer backing this now also mis-sizes PyCSimple.cdata and AsBuffer views for these types.

Consider making these arms respect the native C sizes, e.g.:

-use std::ffi::{c_uint, c_ulong, c_ulonglong, c_ushort};
+use std::ffi::{c_long, c_uint, c_ulong, c_ulonglong, c_ushort};
+use widestring::WideChar;
@@
-        "u" => {
-            // c_wchar - 4 bytes (wchar_t on most platforms)
-            if let Ok(s) = value.str(vm)
-                && let Some(c) = s.as_str().chars().next()
-            {
-                return (c as u32).to_ne_bytes().to_vec();
-            }
-            vec![0; 4]
-        }
+        "u" => {
+            // c_wchar - native WideChar width
+            const SIZE: usize = std::mem::size_of::<WideChar>();
+            if let Ok(s) = value.str(vm)
+                && let Some(c) = s.as_str().chars().next()
+            {
+                let code = c as u32;
+                if SIZE == 2 {
+                    let w = code as u16;
+                    return w.to_ne_bytes().to_vec();
+                } else {
+                    return code.to_ne_bytes().to_vec();
+                }
+            }
+            vec![0; SIZE]
+        }
@@
-        "l" => {
-            // c_long (platform dependent, using 4 bytes)
-            if let Ok(int_val) = value.try_int(vm)
-                && let Some(v) = int_val.as_bigint().to_i32()
-            {
-                return v.to_ne_bytes().to_vec();
-            }
-            vec![0; 4]
-        }
-        "L" => {
-            // c_ulong (platform dependent, using 4 bytes)
-            if let Ok(int_val) = value.try_int(vm)
-                && let Some(v) = int_val.as_bigint().to_u32()
-            {
-                return v.to_ne_bytes().to_vec();
-            }
-            vec![0; 4]
-        }
+        "l" => {
+            // c_long (platform dependent)
+            let size = std::mem::size_of::<c_long>();
+            if let Ok(int_val) = value.try_int(vm) {
+                let i = int_val.as_bigint();
+                if size == 4 {
+                    if let Some(v) = i.to_i32() {
+                        return v.to_ne_bytes().to_vec();
+                    }
+                } else if size == 8 {
+                    if let Some(v) = i.to_i64() {
+                        return v.to_ne_bytes().to_vec();
+                    }
+                }
+            }
+            vec![0; size]
+        }
+        "L" => {
+            // c_ulong (platform dependent)
+            let size = std::mem::size_of::<c_ulong>();
+            if let Ok(int_val) = value.try_int(vm) {
+                let i = int_val.as_bigint();
+                if size == 4 {
+                    if let Some(v) = i.to_u32() {
+                        return v.to_ne_bytes().to_vec();
+                    }
+                } else if size == 8 {
+                    if let Some(v) = i.to_u64() {
+                        return v.to_ne_bytes().to_vec();
+                    }
+                }
+            }
+            vec![0; size]
+        }

This also addresses the earlier review about 4‑byte c_long/c_ulong while keeping behavior correct on both 32‑ and 64‑bit targets.

Also applies to: 277-420

crates/vm/src/stdlib/ctypes/union.rs (1)

133-153: Union.from_address still doesn’t read from the given address (and still uses size_of(Either::A(cls)))

This method is still constructing a zero-initialized buffer with CDataObject::new(size) and never reading from address, so the resulting union doesn’t reflect the memory at that location. It also uses size_of(Either::A(cls)), which is unlikely to hit the size_of_instances path on the type object.

Align it with the Structure.from_address pattern instead:

     #[pyclassmethod]
     fn from_address(cls: PyTypeRef, address: isize, vm: &VirtualMachine) -> PyResult {
         use crate::function::Either;
         use crate::stdlib::ctypes::_ctypes::size_of;
 
-        // Get size from cls
-        let size = size_of(Either::A(cls.clone()), vm)?;
-
-        // Create instance with data from address
+        // Get size from cls
+        let size = size_of(Either::B(cls.as_object().clone()), vm)?;
+
+        // Create instance with data from address
         if address == 0 || size == 0 {
             return Err(vm.new_value_error("NULL pointer access".to_owned()));
         }
 
-        Ok(PyCUnion {
-            cdata: PyRwLock::new(CDataObject::new(size)),
-        }
+        let data = unsafe {
+            let ptr = address as *const u8;
+            std::slice::from_raw_parts(ptr, size).to_vec()
+        };
+
+        Ok(PyCUnion {
+            cdata: PyRwLock::new(CDataObject::from_bytes(data, None)),
+        }
         .into_ref_with_type(vm, cls)?
         .into())
     }

This addresses the original concern and makes from_address actually reflect the pointed-to memory.

🧹 Nitpick comments (3)
crates/vm/src/stdlib/ctypes/field.rs (1)

106-125: Union support in CField looks correct; consider surfacing OOB write errors

The new structure/union branches correctly use cdata and guard against offset + size > buffer.len(), matching the new buffer model. Right now, failed bounds checks just skip the write (and reads fall back to 0), which can make descriptor/size bugs hard to notice. If you ever see mis-sized fields at runtime, it may be worth turning these silent skips into explicit errors or at least debug assertions.

Also applies to: 200-238

crates/vm/src/stdlib/ctypes/pointer.rs (1)

30-37: Pointer constructors are safe and consistent with current .contents semantics

The new from_address / from_buffer / from_buffer_copy helpers perform the right null/offset/bounds checks and only stash the address as a Python int in contents, which is safe given that .contents is still marked “TODO: not correct”. Once you implement real pointee semantics, you’ll likely want these paths to also consult _type_ and keep a reference to the source buffer for from_buffer/from_buffer_copy, but for now the implementation is internally consistent.

Also applies to: 87-178

crates/vm/src/stdlib/ctypes/array.rs (1)

279-295: Avoid cloning the whole buffer in getitem_by_index for large arrays

getitem_by_index currently does:

let buffer = zelf.cdata.read().buffer.clone();
if offset + element_size <= buffer.len() {
    let bytes = &buffer[offset..offset + element_size];
    ...
}

This clones the entire backing buffer on every element access. For large arrays, it would be cheaper to slice directly under the read lock:

-        let buffer = zelf.cdata.read().buffer.clone();
-        if offset + element_size <= buffer.len() {
-            let bytes = &buffer[offset..offset + element_size];
+        let cdata = zelf.cdata.read();
+        if offset + element_size <= cdata.buffer.len() {
+            let bytes = &cdata.buffer[offset..offset + element_size];

The semantics stay the same, but you avoid an O(n) copy per access.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cfa2fea and 9af05c3.

📒 Files selected for processing (7)
  • crates/vm/src/stdlib/ctypes.rs (4 hunks)
  • crates/vm/src/stdlib/ctypes/array.rs (12 hunks)
  • crates/vm/src/stdlib/ctypes/base.rs (9 hunks)
  • crates/vm/src/stdlib/ctypes/field.rs (3 hunks)
  • crates/vm/src/stdlib/ctypes/pointer.rs (3 hunks)
  • crates/vm/src/stdlib/ctypes/structure.rs (5 hunks)
  • crates/vm/src/stdlib/ctypes/union.rs (2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style in Rust code (cargo fmt to format)
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by your changes
Follow Rust best practices for error handling and memory management

Files:

  • crates/vm/src/stdlib/ctypes/pointer.rs
  • crates/vm/src/stdlib/ctypes.rs
  • crates/vm/src/stdlib/ctypes/field.rs
  • crates/vm/src/stdlib/ctypes/structure.rs
  • crates/vm/src/stdlib/ctypes/array.rs
  • crates/vm/src/stdlib/ctypes/base.rs
  • crates/vm/src/stdlib/ctypes/union.rs
**/src/**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/vm/src/stdlib/ctypes/pointer.rs
  • crates/vm/src/stdlib/ctypes.rs
  • crates/vm/src/stdlib/ctypes/field.rs
  • crates/vm/src/stdlib/ctypes/structure.rs
  • crates/vm/src/stdlib/ctypes/array.rs
  • crates/vm/src/stdlib/ctypes/base.rs
  • crates/vm/src/stdlib/ctypes/union.rs
🧠 Learnings (5)
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.

Applied to files:

  • crates/vm/src/stdlib/ctypes.rs
  • crates/vm/src/stdlib/ctypes/base.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.

Applied to files:

  • crates/vm/src/stdlib/ctypes.rs
  • crates/vm/src/stdlib/ctypes/base.rs
📚 Learning: 2025-11-10T06:27:41.954Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6243
File: vm/src/function/buffer.rs:123-129
Timestamp: 2025-11-10T06:27:41.954Z
Learning: In vm/src/function/buffer.rs, line 36 in the try_rw_bytes_like method intentionally uses TypeError (not BufferError) for the "buffer is not a read-write bytes-like object" error case, even though a similar error message in ArgMemoryBuffer::try_from_borrowed_object uses BufferError. The TypeError is the intended exception type for this specific code path.

Applied to files:

  • crates/vm/src/stdlib/ctypes.rs
  • crates/vm/src/stdlib/ctypes/array.rs
  • crates/vm/src/stdlib/ctypes/base.rs
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In the RustPython codebase, only certain builtin types should be marked with the SEQUENCE flag for pattern matching. List and tuple are sequences, but bytes, bytearray, and range are not considered sequences in this context, even though they may implement sequence-like protocols.

Applied to files:

  • crates/vm/src/stdlib/ctypes/array.rs
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In RustPython's pattern matching implementation, only certain builtin types should have the SEQUENCE flag: list and tuple are confirmed sequences. The user youknowone indicated that bytes, bytearray are not considered sequences in this context, even though they implement sequence-like protocols.

Applied to files:

  • crates/vm/src/stdlib/ctypes/array.rs
🧬 Code graph analysis (5)
crates/vm/src/stdlib/ctypes.rs (1)
crates/vm/src/stdlib/ctypes/base.rs (3)
  • size (178-180)
  • new (161-167)
  • new (205-211)
crates/vm/src/stdlib/ctypes/field.rs (2)
crates/vm/src/stdlib/ctypes/base.rs (14)
  • size (178-180)
  • value (46-48)
  • value (50-52)
  • value (54-56)
  • value (86-86)
  • value (96-96)
  • value (106-106)
  • value (114-114)
  • value (115-115)
  • value (126-126)
  • value (137-137)
  • value (138-138)
  • value (282-282)
  • value (471-516)
crates/vm/src/stdlib/ctypes/array.rs (2)
  • value (367-371)
  • value (375-375)
crates/vm/src/stdlib/ctypes/structure.rs (2)
crates/vm/src/stdlib/ctypes.rs (2)
  • get_size (140-159)
  • size_of (380-393)
crates/vm/src/stdlib/ctypes/base.rs (9)
  • new (161-167)
  • new (205-211)
  • std (544-544)
  • std (547-547)
  • std (550-550)
  • size (178-180)
  • from_bytes (169-175)
  • buffer (732-732)
  • buffer (742-742)
crates/vm/src/stdlib/ctypes/array.rs (2)
crates/vm/src/stdlib/ctypes/base.rs (9)
  • new (161-167)
  • new (205-211)
  • from_bytes (169-175)
  • buffer (732-732)
  • buffer (742-742)
  • std (544-544)
  • std (547-547)
  • std (550-550)
  • size (178-180)
crates/vm/src/stdlib/ctypes.rs (1)
  • size_of (380-393)
crates/vm/src/stdlib/ctypes/union.rs (3)
crates/vm/src/stdlib/ctypes/structure.rs (9)
  • field (72-73)
  • field (233-233)
  • std (133-133)
  • std (144-144)
  • from_address (310-333)
  • from_buffer (336-385)
  • buffer (436-436)
  • buffer (446-446)
  • from_buffer_copy (388-429)
crates/vm/src/stdlib/ctypes/base.rs (12)
  • std (544-544)
  • std (547-547)
  • std (550-550)
  • from_address (564-591)
  • size (178-180)
  • new (161-167)
  • new (205-211)
  • from_buffer (594-643)
  • buffer (732-732)
  • buffer (742-742)
  • from_bytes (169-175)
  • from_buffer_copy (646-686)
crates/vm/src/stdlib/ctypes/pointer.rs (6)
  • std (29-29)
  • std (115-115)
  • std (156-156)
  • from_address (88-98)
  • from_buffer (101-142)
  • from_buffer_copy (145-178)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Run tests under miri
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run rust tests (macos-latest)
🔇 Additional comments (6)
crates/vm/src/stdlib/ctypes.rs (1)

161-175: New size/byte helpers align with the buffer-backed model

get_size_from_type and bytes_to_pyobject correctly derive sizes from _type_ and use platform C types (c_short, c_long, c_double, WideChar, etc.) for decoding, and new_simple_type now initializes PyCSimple.cdata with an appropriately sized CDataObject. This ties the high‑level ctypes API cleanly into the new shared buffer representation.

Also applies to: 177-340, 364-369

crates/vm/src/stdlib/ctypes/base.rs (1)

148-181: CDataObject abstraction and PyCSimple buffer exposure look solid

The new CDataObject wrapper cleanly centralizes buffer/base/object tracking, and wiring it through PyCData / PyCSimple.cdata plus SIMPLE_BUFFER_METHODS and AsBuffer for PyCSimple gives a consistent, writable buffer view for simple ctypes. The nested PyMappedRwLock*Guard::map usage correctly exposes buffer.as_slice() / buffer.as_mut_slice() without leaking locks.

Also applies to: 183-195, 256-267, 728-763

crates/vm/src/stdlib/ctypes/structure.rs (2)

187-207: cdata-backed PyCStructure and buffer protocol integration look good

Switching PyCStructure to a shared CDataObject and wiring STRUCTURE_BUFFER_METHODS / AsBuffer to expose cdata.buffer gives a coherent, writable view of structure storage. The debug impl reporting cdata.size() is also a nice sanity aid when diagnosing layout issues.

Also applies to: 432-466


309-385: Unable to verify through code access due to repository clone failures. Based on the provided code snippet and learnings, I cannot confirm the specific technical claims about Either enum behavior and PyBuffer structure through direct inspection.

from_address / from_buffer should query size via the type object and keep from_buffer's source alive — verification inconclusive

The review raises concerns about:

  1. Using size_of(Either::A(cls.clone())) vs Either::B(cls.as_object().clone()) for accessing type's size_of_instances hook
  2. Passing None vs Some(buffer.obj.clone()) to preserve buffer lifetime

However, I cannot directly verify these claims in the codebase due to infrastructure limitations. The concerns rest on:

  • Whether the Either enum truly has branches A and B with the described semantics
  • Whether PyBuffer has an obj field accessible as described
  • Whether the distinction between Either variants actually affects how size_of resolves for structure types

These are architectural decisions that require examination of the Either enum definition, size_of function implementation, and PyBuffer structure. Without this access, the validity of the proposed changes cannot be confirmed.

crates/vm/src/stdlib/ctypes/union.rs (1)

117-123: cdata-backed PyCUnion and buffer protocol hook-in look consistent

The PyCUnion struct’s cdata field plus UNION_BUFFER_METHODS and AsBuffer give unions the same buffer behavior as simple types and structures. Mapping the PyRwLock guards down to CDataObject.buffer is clean and should work well with the rest of the new ctypes buffer model.

Also applies to: 240-269

crates/vm/src/stdlib/ctypes/array.rs (1)

403-440: I apologize, but I'm encountering persistent infrastructure issues that prevent me from verifying this review comment:

  1. Repository cloning failed: Multiple attempts to clone the RustPython repository have failed
  2. Web search limitations: Generic searches don't return the specific implementation details needed to verify the size_of function behavior with Either::A vs Either::B

The review comment makes a specific technical claim about how _ctypes::size_of() should be called with array types, but I cannot:

  • Access the actual implementation of size_of() to understand how it handles Either::A vs Either::B
  • Verify whether array types have size_of_instances implemented
  • Confirm the current code actually fails as claimed
  • Validate that the proposed fix is correct

I recommend manual verification by:

  1. Checking the _ctypes::size_of() implementation to understand how it processes Either::A(PyTypeRef) vs Either::B(PyObjectRef)
  2. Verifying whether size_of_instances is implemented on array types
  3. Testing whether the current code produces the claimed error or if it works correctly
  4. Confirming the proposed fix resolves any actual issue

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (3)
crates/vm/src/stdlib/ctypes/union.rs (2)

197-214: from_address doesn't read data from the provided memory address.

This implementation creates a zero-initialized buffer instead of reading the actual bytes at the given address. The stg_info is used to create an empty CDataObject, but the data at address is never read.

+        // SAFETY: Caller must ensure address points to valid memory of at least size bytes
+        let data = unsafe {
+            let ptr = address as *const u8;
+            std::slice::from_raw_parts(ptr, size).to_vec()
+        };
+
         Ok(PyCUnion {
-            cdata: PyRwLock::new(CDataObject::from_stg_info(&stg_info)),
+            cdata: PyRwLock::new(CDataObject::from_bytes(data, None)),
         }

250-259: from_buffer should preserve reference to source buffer.

The None passed to CDataObject::from_bytes means no reference to the source buffer is kept. If the source buffer is garbage collected, data integrity could be compromised for shared buffer scenarios.

-            cdata: PyRwLock::new(CDataObject::from_bytes(data, None)),
+            cdata: PyRwLock::new(CDataObject::from_bytes(data, Some(buffer.obj.clone()))),
crates/vm/src/stdlib/ctypes/array.rs (1)

750-755: Critical: to_arg still creates a dangling pointer.

This was flagged in a previous review. Arg::new(&cdata.buffer) borrows a reference to data inside the RwLockReadGuard, but the guard is dropped when the function returns, making the Arg hold a dangling reference.

The returned Arg will point to freed memory. The caller must ensure the buffer outlives the FFI call:

 impl PyCArray {
     #[allow(unused)]
-    pub fn to_arg(&self, _vm: &VirtualMachine) -> PyResult<libffi::middle::Arg> {
-        let cdata = self.cdata.read();
-        Ok(libffi::middle::Arg::new(&cdata.buffer))
+    pub fn to_arg(&self, _vm: &VirtualMachine) -> PyResult<(Vec<u8>, libffi::middle::Arg)> {
+        let buffer = self.cdata.read().buffer.clone();
+        // Caller must keep the Vec alive for the duration of the FFI call
+        let arg = libffi::middle::Arg::new(&buffer);
+        Ok((buffer, arg))
     }
 }

Note: This fix returns the owned buffer so the caller can keep it alive. The exact solution depends on how to_arg is used in FFI calls.

🧹 Nitpick comments (8)
crates/vm/src/stdlib/ctypes/field.rs (2)

106-129: Silent fallback on bounds check failure may hide bugs.

When offset + size > cdata.buffer.len(), the code falls through to return vm.ctx.new_int(0) without any indication that the access was out of bounds. This could mask programming errors in field offset calculations.

Consider returning an error for out-of-bounds access:

             if offset + size <= cdata.buffer.len() {
                 let bytes = &cdata.buffer[offset..offset + size];
                 return PyCField::bytes_to_value(bytes, size, vm);
+            } else {
+                return Err(vm.new_value_error(format!(
+                    "field offset {} + size {} exceeds buffer length {}",
+                    offset, size, cdata.buffer.len()
+                )));
             }

200-240: Consider extracting common buffer write logic.

The PyCStructure and PyCUnion branches share nearly identical logic for writing to cdata.buffer. A helper method or macro could reduce duplication:

// Example helper approach
fn write_to_cdata_buffer(
    cdata: &mut CDataObject,
    offset: usize,
    size: usize,
    bytes: &[u8],
) -> bool {
    if offset + size <= cdata.buffer.len() {
        cdata.buffer[offset..offset + size].copy_from_slice(bytes);
        true
    } else {
        false
    }
}
crates/vm/src/stdlib/ctypes/base.rs (2)

794-798: Address TODO: Store buffer reference to prevent premature GC.

The TODO notes that from_buffer should store a reference to the source buffer in _objects. While the value is copied, failing to keep the source alive could cause issues if the ctypes object is expected to share memory with the original buffer (which is the semantic of from_buffer vs from_buffer_copy).

Consider implementing this before merge or tracking it as a follow-up issue.

Would you like me to open an issue to track implementing buffer reference storage?


959-978: Consider simplifying the double-map pattern.

The nested PyMappedRwLockReadGuard::map(PyRwLockReadGuard::map(...)) pattern is verbose. A simpler approach would be:

obj_bytes: |buffer| {
    rustpython_common::lock::PyRwLockReadGuard::map(
        buffer.obj_as::<PyCSimple>().cdata.read(),
        |x| x.buffer.as_slice(),
    )
    .into()
},

The intermediate map to &CDataObject before mapping to the slice appears redundant.

crates/vm/src/stdlib/ctypes/pointer.rs (2)

134-176: from_buffer reads pointer value correctly but doesn't track source reference.

The implementation correctly reads a pointer-sized value from the buffer. However, unlike CPython's from_buffer, this doesn't maintain a reference to the source buffer. For pointers this is less critical since only a usize is read, but for consistency with other ctypes, consider storing the source reference in CDataObject.objects.


249-259: Unsafe symbol access looks correct but lacks documentation.

The unsafe block correctly retrieves the symbol address from the library. The dereference *symbol gets the address stored in the symbol table. Consider adding a safety comment explaining why this is safe (library is locked, symbol lifetime is managed by libloading).

         let symbol_address = if let Some(lib) = &*inner_lib {
             unsafe {
+                // SAFETY: Library is held via lock, and symbol is valid for the lifetime
+                // of the library reference. We only read the address value.
                 // Try to get the symbol from the library
                 let symbol: Symbol<'_, *mut u8> = lib.get(symbol_name.as_bytes()).map_err(|e| {
                     vm.new_attribute_error(format!("{}: symbol '{}' not found", e, name.as_str()))
                 })?;
                 *symbol as usize
             }
crates/vm/src/stdlib/ctypes/structure.rs (1)

354-357: Unsafe memory read from arbitrary address - expected but dangerous.

This correctly implements CPython's from_address behavior, which is inherently unsafe. The caller is responsible for ensuring the address points to valid, properly-sized memory. Consider adding a safety comment documenting the contract.

         // Read data from address
         if address == 0 || size == 0 {
             return Err(vm.new_value_error("NULL pointer access".to_owned()));
         }
+        // SAFETY: Caller must ensure `address` points to valid, readable memory
+        // of at least `size` bytes. This matches CPython's from_address behavior.
         let data = unsafe {
             let ptr = address as *const u8;
             std::slice::from_raw_parts(ptr, size).to_vec()
         };
crates/vm/src/stdlib/ctypes/array.rs (1)

657-747: Duplicate in_dll implementation between PyCArrayType and PyCArray.

There are two in_dll implementations with nearly identical logic - one on PyCArrayType (lines 123-202) and one on PyCArray (lines 657-747). Consider consolidating to reduce code duplication.

The implementations are almost identical except for how they obtain element_type/length. Consider extracting the common DLL symbol resolution logic into a helper function.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9af05c3 and a0d067e.

📒 Files selected for processing (11)
  • .github/copilot-instructions.md (1 hunks)
  • crates/vm/src/stdlib/ctypes.rs (7 hunks)
  • crates/vm/src/stdlib/ctypes/array.rs (13 hunks)
  • crates/vm/src/stdlib/ctypes/base.rs (10 hunks)
  • crates/vm/src/stdlib/ctypes/field.rs (3 hunks)
  • crates/vm/src/stdlib/ctypes/function.rs (5 hunks)
  • crates/vm/src/stdlib/ctypes/pointer.rs (3 hunks)
  • crates/vm/src/stdlib/ctypes/structure.rs (9 hunks)
  • crates/vm/src/stdlib/ctypes/thunk.rs (1 hunks)
  • crates/vm/src/stdlib/ctypes/union.rs (3 hunks)
  • crates/vm/src/stdlib/ctypes/util.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style in Rust code (cargo fmt to format)
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by your changes
Follow Rust best practices for error handling and memory management

Files:

  • crates/vm/src/stdlib/ctypes/union.rs
  • crates/vm/src/stdlib/ctypes/field.rs
  • crates/vm/src/stdlib/ctypes/base.rs
  • crates/vm/src/stdlib/ctypes/util.rs
  • crates/vm/src/stdlib/ctypes/pointer.rs
  • crates/vm/src/stdlib/ctypes/thunk.rs
  • crates/vm/src/stdlib/ctypes/function.rs
  • crates/vm/src/stdlib/ctypes.rs
  • crates/vm/src/stdlib/ctypes/array.rs
  • crates/vm/src/stdlib/ctypes/structure.rs
**/src/**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/vm/src/stdlib/ctypes/union.rs
  • crates/vm/src/stdlib/ctypes/field.rs
  • crates/vm/src/stdlib/ctypes/base.rs
  • crates/vm/src/stdlib/ctypes/util.rs
  • crates/vm/src/stdlib/ctypes/pointer.rs
  • crates/vm/src/stdlib/ctypes/thunk.rs
  • crates/vm/src/stdlib/ctypes/function.rs
  • crates/vm/src/stdlib/ctypes.rs
  • crates/vm/src/stdlib/ctypes/array.rs
  • crates/vm/src/stdlib/ctypes/structure.rs
🧠 Learnings (7)
📚 Learning: 2025-11-10T06:27:41.954Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6243
File: vm/src/function/buffer.rs:123-129
Timestamp: 2025-11-10T06:27:41.954Z
Learning: In vm/src/function/buffer.rs, line 36 in the try_rw_bytes_like method intentionally uses TypeError (not BufferError) for the "buffer is not a read-write bytes-like object" error case, even though a similar error message in ArgMemoryBuffer::try_from_borrowed_object uses BufferError. The TypeError is the intended exception type for this specific code path.

Applied to files:

  • crates/vm/src/stdlib/ctypes/union.rs
  • crates/vm/src/stdlib/ctypes/array.rs
📚 Learning: 2025-11-25T11:05:02.900Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T11:05:02.900Z
Learning: When testing Python code, use RustPython (`cargo run -- script.py`) instead of the standard `python` command

Applied to files:

  • .github/copilot-instructions.md
📚 Learning: 2025-11-25T11:05:02.899Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T11:05:02.899Z
Learning: Applies to **/*test*.py : CRITICAL: Do not modify test assertions, test logic, or test data

Applied to files:

  • .github/copilot-instructions.md
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.

Applied to files:

  • crates/vm/src/stdlib/ctypes/base.rs
  • crates/vm/src/stdlib/ctypes.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.

Applied to files:

  • crates/vm/src/stdlib/ctypes/base.rs
  • crates/vm/src/stdlib/ctypes.rs
📚 Learning: 2025-11-25T11:05:02.899Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T11:05:02.899Z
Learning: Applies to **/src/**/*.rs : Use the macro system (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust

Applied to files:

  • crates/vm/src/stdlib/ctypes/function.rs
  • crates/vm/src/stdlib/ctypes.rs
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In the RustPython codebase, only certain builtin types should be marked with the SEQUENCE flag for pattern matching. List and tuple are sequences, but bytes, bytearray, and range are not considered sequences in this context, even though they may implement sequence-like protocols.

Applied to files:

  • crates/vm/src/stdlib/ctypes/array.rs
🧬 Code graph analysis (9)
crates/vm/src/stdlib/ctypes/union.rs (4)
crates/vm/src/stdlib/ctypes/structure.rs (7)
  • std (136-136)
  • std (165-165)
  • from_address (344-366)
  • from_buffer (369-417)
  • buffer (467-467)
  • buffer (477-477)
  • as_buffer (489-497)
crates/vm/src/stdlib/ctypes/util.rs (2)
  • default (15-24)
  • new (28-37)
crates/vm/src/stdlib/ctypes/base.rs (14)
  • new (253-259)
  • size (210-212)
  • std (515-515)
  • std (523-523)
  • std (700-700)
  • std (703-703)
  • std (706-706)
  • from_stg_info (173-180)
  • from_address (720-747)
  • from_buffer (750-799)
  • buffer (963-963)
  • buffer (973-973)
  • from_bytes (183-190)
  • as_buffer (985-993)
crates/vm/src/stdlib/ctypes/function.rs (1)
  • f (368-368)
crates/vm/src/stdlib/ctypes/field.rs (1)
crates/vm/src/stdlib/ctypes/base.rs (18)
  • size (210-212)
  • value (54-56)
  • value (58-60)
  • value (62-64)
  • value (94-94)
  • value (104-104)
  • value (114-114)
  • value (122-122)
  • value (123-123)
  • value (134-134)
  • value (145-145)
  • value (146-146)
  • value (296-296)
  • value (316-316)
  • value (327-327)
  • value (335-335)
  • value (344-344)
  • value_to_bytes (431-571)
crates/vm/src/stdlib/ctypes/base.rs (1)
crates/vm/src/stdlib/ctypes.rs (25)
  • bytes_to_pyobject (185-347)
  • new_simple_type (351-387)
  • std (417-417)
  • std (434-434)
  • std (634-634)
  • type_attr (651-651)
  • get_size (142-161)
  • get_align (164-166)
  • mem (144-144)
  • mem (145-145)
  • mem (146-146)
  • mem (147-147)
  • mem (148-148)
  • mem (149-149)
  • mem (150-150)
  • mem (151-151)
  • mem (152-152)
  • mem (153-153)
  • mem (154-154)
  • mem (155-155)
  • mem (156-156)
  • mem (157-157)
  • mem (158-158)
  • mem (212-212)
  • size_of (391-439)
crates/vm/src/stdlib/ctypes/util.rs (4)
crates/vm/src/stdlib/ctypes/union.rs (1)
  • default (22-26)
crates/vm/src/stdlib/ctypes/base.rs (2)
  • new (253-259)
  • size (210-212)
crates/vm/src/stdlib/ctypes/thunk.rs (1)
  • new (212-297)
crates/vm/src/stdlib/ctypes/field.rs (2)
  • new (36-53)
  • size (259-261)
crates/vm/src/stdlib/ctypes/pointer.rs (2)
crates/vm/src/stdlib/ctypes/array.rs (1)
  • __mul__ (96-121)
crates/vm/src/stdlib/ctypes/util.rs (1)
  • new (28-37)
crates/vm/src/stdlib/ctypes/thunk.rs (1)
crates/vm/src/stdlib/ctypes/base.rs (8)
  • std (515-515)
  • std (523-523)
  • std (700-700)
  • std (703-703)
  • std (706-706)
  • ffi_type_from_str (25-48)
  • get_type_code (18-23)
  • t (22-22)
crates/vm/src/stdlib/ctypes/function.rs (1)
crates/vm/src/stdlib/ctypes/thunk.rs (3)
  • t (31-31)
  • new (212-297)
  • code_ptr (300-302)
crates/vm/src/stdlib/ctypes.rs (2)
crates/vm/src/stdlib/ctypes/base.rs (10)
  • std (515-515)
  • std (523-523)
  • std (700-700)
  • std (703-703)
  • std (706-706)
  • size (210-212)
  • new (253-259)
  • from_bytes (183-190)
  • t (22-22)
  • get_stg_info (238-250)
crates/vm/src/stdlib/ctypes/util.rs (1)
  • new (28-37)
crates/vm/src/stdlib/ctypes/structure.rs (5)
crates/vm/src/stdlib/ctypes.rs (22)
  • field (669-669)
  • get_size (142-161)
  • mem (144-144)
  • mem (145-145)
  • mem (146-146)
  • mem (147-147)
  • mem (148-148)
  • mem (149-149)
  • mem (150-150)
  • mem (151-151)
  • mem (152-152)
  • mem (153-153)
  • mem (154-154)
  • mem (155-155)
  • mem (156-156)
  • mem (157-157)
  • mem (158-158)
  • mem (212-212)
  • size_of (391-439)
  • std (417-417)
  • std (434-434)
  • std (634-634)
crates/vm/src/stdlib/ctypes/base.rs (17)
  • new (253-259)
  • size (210-212)
  • __mul__ (382-384)
  • std (515-515)
  • std (523-523)
  • std (700-700)
  • std (703-703)
  • std (706-706)
  • from_stg_info (173-180)
  • _objects (224-226)
  • _objects (622-624)
  • from_address (720-747)
  • from_buffer (750-799)
  • buffer (963-963)
  • buffer (973-973)
  • from_buffer_copy (802-842)
  • as_buffer (985-993)
crates/vm/src/stdlib/ctypes/util.rs (1)
  • new (28-37)
crates/vm/src/stdlib/ctypes/array.rs (11)
  • __mul__ (96-121)
  • std (271-271)
  • std (274-274)
  • std (277-277)
  • _objects (335-337)
  • from_address (495-530)
  • from_buffer (533-598)
  • buffer (762-762)
  • buffer (772-772)
  • from_buffer_copy (601-655)
  • as_buffer (784-792)
crates/vm/src/stdlib/ctypes/pointer.rs (7)
  • __mul__ (23-40)
  • std (29-29)
  • std (149-149)
  • std (190-190)
  • from_address (122-132)
  • from_buffer (135-176)
  • from_buffer_copy (179-212)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Run tests under miri
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Run rust tests (windows-latest)
🔇 Additional comments (22)
.github/copilot-instructions.md (1)

47-47: LGTM!

The feature flag example update from ssl to jit is consistent with the JIT support section documented later in the file.

crates/vm/src/stdlib/ctypes/util.rs (1)

1-38: LGTM!

Clean implementation of StgInfo that mirrors CPython's storage information pattern. The struct appropriately captures size, alignment, and metadata needed for ctypes types. The Default implementation with align: 1 is correct as a minimal alignment constraint.

crates/vm/src/stdlib/ctypes/function.rs (2)

274-315: LGTM! Proper thunk lifecycle management for Python callables.

The implementation correctly:

  1. Creates a PyCThunk to wrap the Python callable
  2. Extracts and parses _argtypes_ and _restype_ from the class
  3. Stores the thunk as a PyRef<PyCThunk> in the handler field to keep the closure alive
  4. Sets needs_free: AtomicCell::new(true) to indicate resource ownership

The thunk's CodePtr is properly extracted and stored in ptr for FFI calls.


78-135: LGTM!

Comprehensive handling of FFI return type conversion covering all ctypes type codes. The fallback to i32 interpretation when _type_ is absent provides sensible default behavior for FFI returns.

crates/vm/src/stdlib/ctypes/base.rs (4)

156-213: LGTM! Clean CDataObject abstraction.

The CDataObject struct provides a well-designed backing store for ctypes data:

  • from_stg_info for zero-initialized buffers
  • from_bytes for pre-existing data
  • from_base for sub-object scenarios

The separation of concerns between buffer storage and object references is appropriate.


510-525: LGTM! Platform-dependent sizing correctly implemented.

The c_long and c_ulong handling now uses libc::c_long / libc::c_ulong with std::mem::size_of for correct platform-dependent sizing, addressing the previous concern about hardcoded 4-byte sizes.


719-747: LGTM!

The from_address implementation correctly reads bytes from the provided memory address, with appropriate null pointer checks and safety documentation. This is the expected behavior for ctypes address-based construction.


844-917: LGTM! Thorough in_dll implementation.

The implementation correctly:

  1. Handles both integer handles and CDLL-like objects with _handle attribute
  2. Uses proper null-terminated symbol names
  3. Reads symbol values with appropriate safety considerations
  4. Stores the dll reference in cdata.base to prevent premature unloading
crates/vm/src/stdlib/ctypes/thunk.rs (2)

183-303: Well-structured thunk implementation.

The PyCThunk::new implementation follows a sound pattern:

  • Parses argument and result types correctly
  • Creates FFI CIF with appropriate types
  • Manages userdata lifetime via Box::into_raw with cleanup in Drop
  • Stores both closure and userdata together to ensure proper lifecycle

The use of 'static lifetime for the userdata reference is safe because ThunkData maintains ownership of the leaked box and cleans it up on drop.


305-310: Based on my investigation, I've identified significant evidence that supports the original review concern, though I cannot directly access the codebase for complete verification.

Key findings from external sources:

  1. libffi closure thread-safety (confirmed via libffi documentation): Closures are NOT inherently thread-safe. The documented requirements include:

    • ffi_prep_cif must be called before threads start or with serialization
    • Closure objects should not be concurrently allocated/freed
    • User-data/environment must be made explicitly thread-safe
    • libffi does not automatically make captured data Send/Sync
  2. RustPython VM threading model (confirmed via RustPython embedding guidance):

    • RustPython's VM is not thread-safe for arbitrary concurrent access
    • VM APIs must be called from within vm.enter() context on the VM's thread
    • Calling VM methods from arbitrary threads violates the threading contract

Critical concern: The callback executes from C code (libffi), potentially from an arbitrary thread. If it then calls with_current_vm to invoke the Python callable, this violates RustPython's threading model unless there's explicit synchronization not reflected in the comment.

The justification comment "thunk_data contains the libffi closure which is heap-allocated" is insufficient. Heap allocation alone does not provide thread-safety—explicit synchronization mechanisms or thread-pinning are required.


Verify that the callback mechanism includes proper synchronization (e.g., channels, locks, or thread-pinning) to ensure with_current_vm is only invoked from the VM's thread, not from arbitrary C threads spawned by libffi.

crates/vm/src/stdlib/ctypes.rs (5)

184-347: Well-structured type conversion with proper platform-aware sizes.

The implementation correctly uses mem::size_of for each C type, addressing the previous platform-dependent size concern. The pattern of using const SIZE with from_ne_bytes is clean and efficient.


371-378: LGTM - proper CData initialization.

The simple type now correctly allocates a zero-initialized buffer of the appropriate size based on the type character.


389-439: Good coverage of ctypes size computation.

The function handles various ctypes types comprehensively. The logic correctly distinguishes between instances (using cdata.read().size()) and type objects (using stg_info or computing from _type_).


163-166: Based on my verification using the web search, I can now provide my assessment:

The alignment assumption in get_align is technically incorrect for some platforms.

The web search confirms the review comment's concern is accurate: on 32-bit x86 Linux, c_double has 8-byte size but only 4-byte alignment (GCC default behavior per the i386 ABI). This means the assumption that "alignment equals size" does not hold universally.

However, I cannot access the repository to fully verify:

  • How get_align is actually used in the codebase
  • Whether this causes actual bugs or is a documented limitation
  • If tests or other code already account for this

The original review comment appropriately notes this is consistent with the existing approach, suggesting it may be an accepted trade-off rather than an undiscovered issue.


763-810: Based on my verification, the review comment is accurate. Let me provide the final rewritten review:


Cast implementation correctly matches CPython's permissive reinterpret_cast behavior.

The function properly validates that the destination type is a ctypes pointer type (via _type_ attribute check) and accepts pointer-convertible source objects (CData instances, arrays, and integers). It intentionally does not validate type compatibility between source and destination, which aligns with CPython's design—cast() exists precisely to bypass strict type checks and reinterpret memory. This implementation is sound and correct.

crates/vm/src/stdlib/ctypes/pointer.rs (2)

74-88: Constructor allows any value as initial contents.

The constructor accepts any PyObjectRef as initial contents without validating it against the pointer's _type_. This is consistent with CPython's behavior but could lead to confusing runtime errors.


121-132: Pointer from_address correctly stores address without dereferencing.

Unlike Structure.from_address which reads memory at the address, _Pointer.from_address correctly just stores the address value. This matches CPython's pointer semantics.

crates/vm/src/stdlib/ctypes/structure.rs (2)

463-497: Buffer protocol implementation looks correct.

The AsBuffer implementation properly exposes the structure's internal buffer through the Python buffer protocol. The use of mapped lock guards ensures the buffer reference remains valid while accessed.


166-178: I'll help you verify the review comment about the overflow check in the __mul__ implementation. Let me start by examining the code context and searching for similar patterns in the ctypes module.
<function_calls>


#!/bin/bash

First, get context around lines 166-178 in structure.rs

wc -l crates/vm/src/stdlib/ctypes/structure.rs


</function_calls>
<function_calls>


#!/bin/bash

Read the section around line 165-180 to see the TODO and context

sed -n '160,185p' crates/vm/src/stdlib/ctypes/structure.rs


</function_calls>
<function_calls>


#!/bin/bash

Search for other places in ctypes where we multiply size/length values

that might need similar overflow checks

rg -n 'element_size.**|.checked_mul|overflow' crates/vm/src/stdlib/ctypes/ --type rs


</function_calls>

crates/vm/src/stdlib/ctypes/array.rs (3)

588-597: Good - from_buffer correctly tracks source buffer reference.

Unlike the structure implementation, this correctly stores a reference to the source buffer in cdata.objects, ensuring the source stays alive. This is the correct CPython behavior for from_buffer.


758-792: LGTM - consistent AsBuffer implementation.

The buffer protocol implementation follows the same correct pattern as PyCStructure, properly exposing the internal buffer through mapped lock guards.


505-529: Unsafe memory read matches CPython semantics.

Similar to PyCStructure.from_address, this correctly implements the unsafe from_address behavior. The caller is responsible for ensuring valid memory.

Comment on lines +60 to +64
fn python_to_ffi(obj: PyResult, ty: &PyTypeRef, result: *mut c_void, vm: &VirtualMachine) {
let obj = match obj {
Ok(o) => o,
Err(_) => return, // Exception occurred, leave result as-is
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Exception in callable leaves FFI result uninitialized.

When the Python callable raises an exception, python_to_ffi returns early without writing to result. The C caller will receive whatever garbage was in the result buffer. Consider zero-initializing the result on error:

let obj = match obj {
    Ok(o) => o,
    Err(_) => {
        // Zero-initialize result on error
        unsafe { std::ptr::write_bytes(result, 0, 1); }
        return;
    }
};

Note: CPython sets an exception flag that the caller should check; this may be acceptable if RustPython has similar semantics.

🤖 Prompt for AI Agents
crates/vm/src/stdlib/ctypes/thunk.rs lines 60-64: when the Python callable
returns Err the function returns without writing to the FFI result buffer,
leaving uninitialized garbage for the C caller; fix by zero-initializing the
result buffer before returning on error (use an unsafe pointer write that fills
result with zeros for the appropriate size of the target type) so the C side
sees a deterministic value, and keep existing exception propagation semantics.

Comment on lines +173 to +177
let size = PyCUnionType::get_field_size(&field_type, vm)?;
max_size = max_size.max(size);
// For simple types, alignment == size
max_align = max_align.max(size);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Alignment calculation assumes alignment equals size, which may be incorrect.

Setting max_align = max_align.max(size) assumes every field's alignment equals its size. This isn't true for composite types like arrays (where alignment is the element's alignment, not the array size) or structures with padding.

Consider using the type's actual alignment:

// For simple types, get alignment from _type_ attribute or stg_info
let align = PyCUnionType::get_field_align(&field_type, vm)?;
max_align = max_align.max(align);
🤖 Prompt for AI Agents
In crates/vm/src/stdlib/ctypes/union.rs around lines 173 to 177, the code
incorrectly assumes field alignment equals field size when computing max_align;
instead obtain each field's actual alignment (e.g., via a new or existing
PyCUnionType::get_field_align(&field_type, vm)? helper or reading the type's
stg_info/_type_/alignment) and use that value to update max_align (max_align =
max_align.max(align)); ensure the helper handles composite types (arrays,
structs) and returns the element/contained type alignment rather than array
size.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
crates/vm/src/stdlib/ctypes.rs (1)

563-571: addressof returns the address of the Python value cell, not the underlying C buffer

With the new CDataObject backing, addressof on _SimpleCData now does:

let simple = obj.downcast_ref::<PyCSimple>().unwrap();
Ok(simple.value.as_ptr() as usize)

This is the address of the AtomicCell<PyObjectRef> storage, not the C data buffer, and ignores the new cdata field. For FFI and CPython‑compatible semantics, this should point at the C storage (e.g., simple.cdata.read().buffer.as_ptr()), and it currently rejects all non‑simple ctypes instances.

Refactor addressof to use the CDataObject buffer (ideally via PyCData/cdata, so it works for structures/arrays/unions as well), and only fall back to a TypeError if the object is not a ctypes instance.

crates/vm/src/stdlib/ctypes/base.rs (1)

142-145: c_wchar ("u") uses different sizes in value_to_bytes vs get_size/bytes_to_pyobject

get_size("u") (and bytes_to_pyobject’s "u" arm) use mem::size_of::<WideChar>(), but value_to_bytes hardcodes 4 bytes for "u":

"u" => {
    // c_wchar - 4 bytes (wchar_t on most platforms)return (c as u32).to_ne_bytes().to_vec();
}

On platforms where wchar_t/WideChar is 2 bytes (notably Windows), the CDataObject buffer for _SimpleCData "u" instances will be 4 bytes while get_size("u") and the from_address / from_buffer paths expect 2 bytes. This leads to inconsistent sizeof, buffer lengths, and memory reads.

Make the "u" branch derive its width from get_size("u") (or mem::size_of::<WideChar>()), e.g. by writing either a 2‑byte u16 or 4‑byte u32 depending on the configured size, and zero‑filling to that size when conversion fails.

Also applies to: 448-456

♻️ Duplicate comments (1)
crates/vm/src/stdlib/ctypes/thunk.rs (1)

59-64: Result buffer is left uninitialized when the Python callable raises

If the wrapped Python callable returns Err, python_to_ffi simply returns without touching result, so C callers see whatever garbage was in the result slot previously.

Even if the C side is expected to check for a Python exception, it’s safer and more deterministic to zero‑initialize the result buffer (according to the result type) before returning on error. This was already noted in earlier review; keeping it in mind while refining the thunk API would be beneficial.

🧹 Nitpick comments (5)
crates/vm/src/stdlib/ctypes.rs (1)

763-810: _cast only mutates .contents and never touches pointer backing; semantics may diverge from CPython

pycfunction_cast constructs an instance of ctype and then implements cast by doing:

let result = ctype.call((), vm)?;
…
result.set_attr("contents", ptr_value, vm)?;

This treats cast essentially as “make a fresh ctypes instance and assign its contents attribute”:

  • It doesn’t validate that ctype is actually a pointer type (only checks _type_ exists).
  • It doesn’t interact with CDataObject / buffer storage, so any code that relies on pointer value being encoded in the backing cdata may not see the casted address.
  • For non‑pointer types that define a contents attribute, this will behave oddly.

If you intend to match CPython’s ctypes.cast, consider:

  • Verifying ctype is a subclass of _Pointer, not just something with _type_.
  • Setting the pointer value in the underlying cdata representation instead of (or in addition to) assigning contents.
  • Adding tests that mirror CPython’s ctypes.cast behaviour for ctypes instances vs integer addresses.
crates/vm/src/stdlib/ctypes/base.rs (2)

566-569: Pointer-like simple types expose zeroed buffers independent of their value

For "P" | "z" | "Z" the new value_to_bytes implementation currently returns:

// Pointer types (platform pointer size)
vec![0; std::mem::size_of::<usize>()]

ignoring the actual Python value. That means memoryviews and any buffer‑based consumers of _SimpleCData pointer types will always see a zeroed buffer, even when the object holds a non‑NULL address or string.

If you intend the buffer protocol to reflect the underlying C representation, consider encoding:

  • For "P": the integer address (or 0 for None) as a native‑endian usize.
  • For "z"/"Z": either the address of the underlying bytes/wchar buffer (if you model it), or at least an integer address consistent with bytes_to_pyobject/from_address.

628-672: Unsafe direct access to AtomicCell<PyObjectRef> is a concurrency footgun

PyCSimple::value clones the stored value via:

let raw_value = unsafe { (*zelf.value.as_ptr()).clone() };

This bypasses AtomicCell’s safe API and relies on manual unsafe pointer reads. While ctypes values are unlikely to be mutated concurrently in practice, this pattern is fragile and easy to get wrong if the representation evolves.

Consider replacing this with a safer container (e.g., PyRwLock<PyObjectRef>) or wrapping the unsafety into a small, well‑documented helper so that concurrent mutations and future refactors don’t accidentally introduce UB.

crates/vm/src/stdlib/ctypes/pointer.rs (1)

75-213: PyCPointer.contents mixes raw addresses and referent objects

The new constructor, __init__, from_address, from_buffer*, and in_dll all write different kinds of values into contents:

  • py_new/__init__ store an arbitrary PyObjectRef (typically a ctypes instance).
  • from_address/from_buffer*/in_dll store an integer address in the same field.

Downstream code will need to distinguish “pointer to object” vs “raw address” based solely on runtime type, which is error‑prone. It may be clearer to standardize representation (e.g., keep the raw address in cdata and treat contents purely as the dereferenced value, or vice versa) and have helpers consistently translate between the two.

crates/vm/src/stdlib/ctypes/structure.rs (1)

140-156: Structure layout tracks alignment but never pads fields or final size

get_field_align computes an alignment for each field (defaulting to 1), and PyCStructure::py_new keeps a max_align and stores it in StgInfo. However, field offsets are still computed as a simple running sum of sizes:

let mut offset = 0usize;
…
offset += size;
…
total_size = offset;
let mut stg_info = StgInfo::new(total_size, max_align);

There’s no per‑field padding to alignment boundaries and no rounding of total_size up to max_align, so the actual in‑memory layout is tightly packed regardless of the computed alignments. If you intend to approximate C struct layout (as CPython ctypes.Structure does), you’ll likely need to:

  • Align each field’s offset up to its alignment, and
  • Round the final total_size up to max_align.

Otherwise, the tracked alignment metadata may be misleading.

Also applies to: 245-295

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a0d067e and b8c7509.

📒 Files selected for processing (6)
  • crates/vm/src/stdlib/ctypes.rs (7 hunks)
  • crates/vm/src/stdlib/ctypes/base.rs (10 hunks)
  • crates/vm/src/stdlib/ctypes/pointer.rs (3 hunks)
  • crates/vm/src/stdlib/ctypes/structure.rs (9 hunks)
  • crates/vm/src/stdlib/ctypes/thunk.rs (1 hunks)
  • crates/vm/src/stdlib/ctypes/util.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/vm/src/stdlib/ctypes/util.rs
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style in Rust code (cargo fmt to format)
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by your changes
Follow Rust best practices for error handling and memory management

Files:

  • crates/vm/src/stdlib/ctypes/thunk.rs
  • crates/vm/src/stdlib/ctypes/pointer.rs
  • crates/vm/src/stdlib/ctypes/base.rs
  • crates/vm/src/stdlib/ctypes/structure.rs
  • crates/vm/src/stdlib/ctypes.rs
**/src/**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/vm/src/stdlib/ctypes/thunk.rs
  • crates/vm/src/stdlib/ctypes/pointer.rs
  • crates/vm/src/stdlib/ctypes/base.rs
  • crates/vm/src/stdlib/ctypes/structure.rs
  • crates/vm/src/stdlib/ctypes.rs
🧠 Learnings (4)
📚 Learning: 2025-11-10T06:27:41.954Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6243
File: vm/src/function/buffer.rs:123-129
Timestamp: 2025-11-10T06:27:41.954Z
Learning: In vm/src/function/buffer.rs, line 36 in the try_rw_bytes_like method intentionally uses TypeError (not BufferError) for the "buffer is not a read-write bytes-like object" error case, even though a similar error message in ArgMemoryBuffer::try_from_borrowed_object uses BufferError. The TypeError is the intended exception type for this specific code path.

Applied to files:

  • crates/vm/src/stdlib/ctypes/thunk.rs
  • crates/vm/src/stdlib/ctypes/structure.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.

Applied to files:

  • crates/vm/src/stdlib/ctypes/base.rs
  • crates/vm/src/stdlib/ctypes.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.

Applied to files:

  • crates/vm/src/stdlib/ctypes/base.rs
  • crates/vm/src/stdlib/ctypes.rs
📚 Learning: 2025-11-25T11:05:02.899Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T11:05:02.899Z
Learning: Applies to **/src/**/*.rs : Use the macro system (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust

Applied to files:

  • crates/vm/src/stdlib/ctypes.rs
🧬 Code graph analysis (4)
crates/vm/src/stdlib/ctypes/thunk.rs (1)
crates/vm/src/stdlib/ctypes/base.rs (8)
  • std (516-516)
  • std (524-524)
  • std (701-701)
  • std (704-704)
  • std (707-707)
  • ffi_type_from_str (25-48)
  • get_type_code (18-23)
  • t (22-22)
crates/vm/src/stdlib/ctypes/pointer.rs (3)
crates/vm/src/stdlib/ctypes/structure.rs (1)
  • __mul__ (159-179)
crates/vm/src/stdlib/ctypes/array.rs (1)
  • __mul__ (96-121)
crates/vm/src/stdlib/ctypes/util.rs (1)
  • new (31-40)
crates/vm/src/stdlib/ctypes/structure.rs (2)
crates/vm/src/stdlib/ctypes/base.rs (11)
  • new (254-260)
  • size (210-212)
  • std (516-516)
  • std (524-524)
  • std (701-701)
  • std (704-704)
  • std (707-707)
  • instance (913-913)
  • from_stg_info (173-180)
  • buffer (964-964)
  • buffer (974-974)
crates/vm/src/stdlib/ctypes/util.rs (1)
  • new (31-40)
crates/vm/src/stdlib/ctypes.rs (7)
crates/vm/src/stdlib/ctypes/base.rs (9)
  • std (516-516)
  • std (524-524)
  • std (701-701)
  • std (704-704)
  • std (707-707)
  • size (210-212)
  • new (254-260)
  • t (22-22)
  • get_stg_info (239-251)
crates/vm/src/stdlib/ctypes/pointer.rs (3)
  • std (30-30)
  • std (150-150)
  • std (191-191)
crates/vm/src/stdlib/ctypes/structure.rs (5)
  • std (137-137)
  • std (166-166)
  • field (76-77)
  • field (259-259)
  • field_type (274-275)
crates/vm/src/stdlib/ctypes/array.rs (3)
  • std (271-271)
  • std (274-274)
  • std (277-277)
crates/vm/src/stdlib/ctypes/field.rs (7)
  • size (259-261)
  • new (36-53)
  • name (279-281)
  • obj (107-107)
  • obj (116-116)
  • obj (201-201)
  • obj (218-218)
crates/vm/src/stdlib/ctypes/thunk.rs (2)
  • new (213-298)
  • t (31-31)
crates/vm/src/stdlib/ctypes/util.rs (1)
  • new (31-40)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run tests under miri
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Run rust tests (macos-latest)
🔇 Additional comments (4)
crates/vm/src/stdlib/ctypes/base.rs (1)

571-618: CDataObject-backed storage and buffer implementation for _SimpleCData look consistent

The introduction of CDataObject plus PyCSimple.cdata and the SIMPLE_BUFFER_METHODS/AsBuffer implementation cleanly route buffer access through the shared CData backing, with read/write views correctly mapped via PyMappedRwLock*Guard.

This design should work well for both value updates (set_value updating cdata.buffer) and external memoryview/buffer users.

Also applies to: 960-995

crates/vm/src/stdlib/ctypes/pointer.rs (1)

21-41: Pointer __mul__ and StgInfo integration look good

PyCPointerType::__mul__ computes element size from size_of::<usize>(), populates StgInfo (including length), and returns a PyCArrayType with typ set to the original pointer type and the expected length/element_size.

This mirrors the array/structure patterns and should give consistent sizing/alignment for arrays of pointers.

crates/vm/src/stdlib/ctypes/structure.rs (1)

464-499: Buffer protocol integration for structures looks correct

STRUCTURE_BUFFER_METHODS and the AsBuffer impl for PyCStructure consistently expose the underlying CDataObject.buffer as a writable, simple buffer, matching the new CDataObject‑backed design.

This should make memoryview(Structure()) and similar operations work as expected.

crates/vm/src/stdlib/ctypes/thunk.rs (1)

105-113: Based on the web search results, the technical concern raised in the review is valid and well-founded. The search confirms that:

  • On LP64 systems (Linux, macOS): c_long = 8 bytes
  • On LLP64 systems (Windows x64): c_long = 4 bytes
  • i64 is always 8 bytes

This means writing i64 values into memory slots where libffi expects c_long would indeed create undefined behavior on Windows 64-bit and 32-bit platforms.

However, I cannot access the repository to verify the actual current state of the code due to persistent cloning failures. Therefore, I cannot confirm:

  1. Whether the code at lines 105-113 still has the problematic pattern shown
  2. Whether lines 34-55 in ffi_to_python have been updated to use c_long/c_ulong
  3. Whether ffi_type_from_str matches the actual storage size
  4. The full context of error handling paths

The review comment's technical analysis is sound, but I cannot verify the actual current code state or whether the issue still exists due to infrastructure limitations. Manual verification of the current code is needed to confirm the asymmetry persists and whether the suggested fix should be applied.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
crates/vm/src/stdlib/ctypes/base.rs (2)

54-62: Duplicate condition in set_primitive.

The check for PyBytes with length 1 is duplicated on lines 54-57 and 58-61. This appears to be a copy-paste error.

         "c" => {
             if value
                 .clone()
                 .downcast_exact::<PyBytes>(vm)
                 .is_ok_and(|v| v.len() == 1)
-                || value
-                    .clone()
-                    .downcast_exact::<PyBytes>(vm)
-                    .is_ok_and(|v| v.len() == 1)
                 || value
                     .clone()
                     .downcast_exact::<PyInt>(vm)

35-45: Perfect. The web searches confirm the review comment is correct. Here's what I've verified:

  1. libffi-rs provides the required methods: Type::c_long(), Type::c_ulong(), and Type::pointer() all exist in the libffi-rs crate.

  2. Format code mappings are correct:

    • 'l' = c_long, 'L' = c_ulong
    • 'z' = c_char_p, 'Z' = c_wchar_p, 'P' = c_void_p (all pointer types)
  3. The bug is real and critical:

    • Hardcoding i32/u32 for 'l'/'L' is wrong on 64-bit systems where c_long is 8 bytes
    • Hardcoding u64 for pointers ('z'/'Z'/'P') breaks on 32-bit systems where pointers are 4 bytes
    • libffi correctly treats all pointer types uniformly as FFI_TYPE_POINTER
  4. The suggested fix is valid: Using Type::c_long(), Type::c_ulong(), and Type::pointer() will correctly adapt to the host platform.


Platform-dependent FFI type sizes are hardcoded incorrectly.

c_long/c_ulong mapped to i32/u32 are 8 bytes on 64-bit platforms. Pointer types z/Z/P hardcoded to u64 fail on 32-bit platforms where pointers are 4 bytes. libffi-rs provides platform-aware methods that should be used instead:

-        "l" => Some(libffi::middle::Type::i32()),
-        "L" => Some(libffi::middle::Type::u32()),
+        "l" => Some(libffi::middle::Type::c_long()),
+        "L" => Some(libffi::middle::Type::c_ulong()),
         "q" => Some(libffi::middle::Type::i64()),
         "Q" => Some(libffi::middle::Type::u64()),
         "f" => Some(libffi::middle::Type::f32()),
         "d" => Some(libffi::middle::Type::f64()),
         "g" => Some(libffi::middle::Type::f64()),
         "?" => Some(libffi::middle::Type::u8()),
-        "z" => Some(libffi::middle::Type::u64()),
-        "Z" => Some(libffi::middle::Type::u64()),
-        "P" => Some(libffi::middle::Type::u64()),
+        "z" => Some(libffi::middle::Type::pointer()),
+        "Z" => Some(libffi::middle::Type::pointer()),
+        "P" => Some(libffi::middle::Type::pointer()),
♻️ Duplicate comments (3)
crates/vm/src/stdlib/ctypes/union.rs (1)

198-214: from_address creates zeroed buffer instead of reading from address.

This implementation allocates a zero-initialized buffer rather than reading the actual data from the provided memory address. This is inconsistent with the expected behavior of from_address.

This issue was flagged in a previous review but appears to still be present. The implementation should read data from the address:

     fn from_address(cls: PyTypeRef, address: isize, vm: &VirtualMachine) -> PyResult {
         use crate::stdlib::ctypes::_ctypes::size_of;

         // Get size from cls
         let size = size_of(cls.clone().into(), vm)?;

         // Create instance with data from address
         if address == 0 || size == 0 {
             return Err(vm.new_value_error("NULL pointer access".to_owned()));
         }
-        let stg_info = StgInfo::new(size, 1);
+        
+        // SAFETY: Caller must ensure address points to valid memory
+        let data = unsafe {
+            let ptr = address as *const u8;
+            std::slice::from_raw_parts(ptr, size).to_vec()
+        };
+
         Ok(PyCUnion {
-            cdata: PyRwLock::new(CDataObject::from_stg_info(&stg_info)),
+            cdata: PyRwLock::new(CDataObject::from_bytes(data, None)),
         }
crates/vm/src/stdlib/ctypes/thunk.rs (1)

60-64: Exception in callable leaves FFI result uninitialized.

When the Python callable raises an exception, the function returns without writing to result, leaving uninitialized memory that the C caller may read.

This was flagged in a previous review. Consider zero-initializing:

     let obj = match obj {
         Ok(o) => o,
-        Err(_) => return, // Exception occurred, leave result as-is
+        Err(_) => {
+            // Zero-initialize result on error to avoid undefined behavior
+            // Note: We don't know the exact size, but zeroing a reasonable max is safer
+            unsafe { std::ptr::write_bytes(result, 0, std::mem::size_of::<u64>()); }
+            return;
+        }
     };
crates/vm/src/stdlib/ctypes/array.rs (1)

750-756: Critical: to_arg creates a dangling pointer.

The Arg::new(&cdata.buffer) borrows from cdata which is a RwLockReadGuard. The guard is dropped at the end of the function, but the returned Arg still holds a reference to the buffer data, causing undefined behavior.

This was flagged in a previous review. The Arg holds a pointer to data that will be deallocated when the lock guard is dropped.

 impl PyCArray {
     #[allow(unused)]
-    pub fn to_arg(&self, _vm: &VirtualMachine) -> PyResult<libffi::middle::Arg> {
-        let cdata = self.cdata.read();
-        Ok(libffi::middle::Arg::new(&cdata.buffer))
+    pub fn to_arg(&self, _vm: &VirtualMachine) -> PyResult<(Vec<u8>, libffi::middle::Arg)> {
+        let buffer = self.cdata.read().buffer.clone();
+        // SAFETY: Caller must ensure buffer outlives the Arg
+        let arg = unsafe { libffi::middle::Arg::new(&buffer) };
+        Ok((buffer, arg))
     }
 }
🧹 Nitpick comments (4)
crates/vm/src/stdlib/ctypes.rs (1)

163-166: get_align assumes alignment equals size, which may be incorrect for some types.

For most simple C types this holds, but some platforms may have different alignment requirements (e.g., double on 32-bit x86 may have 4-byte alignment despite 8-byte size).

Consider documenting this assumption or using proper alignment intrinsics if precision is needed:

 /// Get alignment for a simple type - for C types, alignment equals size
+/// Note: This is a simplification that works for most platforms.
+/// Some exotic platforms may have different alignment requirements.
 pub fn get_align(ty: &str) -> usize {
     get_size(ty)
 }
crates/vm/src/stdlib/ctypes/pointer.rs (1)

75-89: Constructor and __init__ have overlapping initialization logic.

Both py_new (Constructor) and __init__ can set the contents value. When Python calls the constructor, py_new runs first setting contents, then __init__ runs potentially overwriting it with the same value. This is redundant but not incorrect.

Consider having py_new always initialize with None and letting __init__ handle the actual initialization:

 fn py_new(cls: PyTypeRef, args: Self::Args, vm: &VirtualMachine) -> PyResult {
-    // Get the initial contents value if provided
-    let initial_contents = args.0.into_option().unwrap_or_else(|| vm.ctx.none());
-
-    // Create a new PyCPointer instance with the provided value
     PyCPointer {
-        contents: PyRwLock::new(initial_contents),
+        contents: PyRwLock::new(vm.ctx.none()),
     }
     .into_ref_with_type(vm, cls)
     .map(Into::into)
 }
crates/vm/src/stdlib/ctypes/array.rs (1)

657-747: Significant code duplication with PyCArrayType::in_dll.

The PyCArray::in_dll implementation (lines 657-747) duplicates most of the logic from PyCArrayType::in_dll (lines 123-202). Both methods perform identical steps for handle extraction, library lookup, and symbol resolution.

Consider extracting the common logic into a shared helper function:

fn get_symbol_address(
    dll: PyObjectRef,
    name: &str,
    vm: &VirtualMachine,
) -> PyResult<usize> {
    // Common handle extraction and symbol lookup logic
    // ...
}

This would reduce duplication and make maintenance easier.

crates/vm/src/stdlib/ctypes/structure.rs (1)

372-464: Consider extracting common from_buffer* patterns.

The from_buffer, from_buffer_copy, and related methods follow the same pattern across PyCArray, PyCStructure, and PyCPointer:

  1. Validate offset
  2. Get buffer/size
  3. Check buffer bounds
  4. Read/copy data
  5. Create instance

Consider creating helper functions in a utility module to reduce duplication:

// In util.rs
fn validate_buffer_offset(offset: isize) -> PyResult<usize> { ... }
fn check_buffer_bounds(buffer_len: usize, offset: usize, size: usize, vm: &VirtualMachine) -> PyResult<()> { ... }
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b8c7509 and 8933738.

📒 Files selected for processing (9)
  • crates/vm/src/stdlib/ctypes.rs (7 hunks)
  • crates/vm/src/stdlib/ctypes/array.rs (13 hunks)
  • crates/vm/src/stdlib/ctypes/base.rs (10 hunks)
  • crates/vm/src/stdlib/ctypes/function.rs (5 hunks)
  • crates/vm/src/stdlib/ctypes/pointer.rs (3 hunks)
  • crates/vm/src/stdlib/ctypes/structure.rs (9 hunks)
  • crates/vm/src/stdlib/ctypes/thunk.rs (1 hunks)
  • crates/vm/src/stdlib/ctypes/union.rs (3 hunks)
  • crates/vm/src/stdlib/ctypes/util.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/vm/src/stdlib/ctypes/util.rs
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style in Rust code (cargo fmt to format)
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by your changes
Follow Rust best practices for error handling and memory management

Files:

  • crates/vm/src/stdlib/ctypes/function.rs
  • crates/vm/src/stdlib/ctypes/pointer.rs
  • crates/vm/src/stdlib/ctypes/base.rs
  • crates/vm/src/stdlib/ctypes/thunk.rs
  • crates/vm/src/stdlib/ctypes/union.rs
  • crates/vm/src/stdlib/ctypes/structure.rs
  • crates/vm/src/stdlib/ctypes.rs
  • crates/vm/src/stdlib/ctypes/array.rs
**/src/**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/vm/src/stdlib/ctypes/function.rs
  • crates/vm/src/stdlib/ctypes/pointer.rs
  • crates/vm/src/stdlib/ctypes/base.rs
  • crates/vm/src/stdlib/ctypes/thunk.rs
  • crates/vm/src/stdlib/ctypes/union.rs
  • crates/vm/src/stdlib/ctypes/structure.rs
  • crates/vm/src/stdlib/ctypes.rs
  • crates/vm/src/stdlib/ctypes/array.rs
🧠 Learnings (5)
📚 Learning: 2025-11-25T11:05:02.899Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T11:05:02.899Z
Learning: Applies to **/src/**/*.rs : Use the macro system (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust

Applied to files:

  • crates/vm/src/stdlib/ctypes/function.rs
  • crates/vm/src/stdlib/ctypes.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.

Applied to files:

  • crates/vm/src/stdlib/ctypes/base.rs
  • crates/vm/src/stdlib/ctypes.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.

Applied to files:

  • crates/vm/src/stdlib/ctypes/base.rs
  • crates/vm/src/stdlib/ctypes.rs
📚 Learning: 2025-11-10T06:27:41.954Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6243
File: vm/src/function/buffer.rs:123-129
Timestamp: 2025-11-10T06:27:41.954Z
Learning: In vm/src/function/buffer.rs, line 36 in the try_rw_bytes_like method intentionally uses TypeError (not BufferError) for the "buffer is not a read-write bytes-like object" error case, even though a similar error message in ArgMemoryBuffer::try_from_borrowed_object uses BufferError. The TypeError is the intended exception type for this specific code path.

Applied to files:

  • crates/vm/src/stdlib/ctypes/thunk.rs
  • crates/vm/src/stdlib/ctypes/union.rs
  • crates/vm/src/stdlib/ctypes/structure.rs
  • crates/vm/src/stdlib/ctypes/array.rs
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In the RustPython codebase, only certain builtin types should be marked with the SEQUENCE flag for pattern matching. List and tuple are sequences, but bytes, bytearray, and range are not considered sequences in this context, even though they may implement sequence-like protocols.

Applied to files:

  • crates/vm/src/stdlib/ctypes/array.rs
🧬 Code graph analysis (2)
crates/vm/src/stdlib/ctypes.rs (4)
crates/vm/src/stdlib/ctypes/array.rs (3)
  • std (271-271)
  • std (274-274)
  • std (277-277)
crates/vm/src/stdlib/ctypes/base.rs (8)
  • std (548-548)
  • std (556-556)
  • std (761-761)
  • std (764-764)
  • std (767-767)
  • size (213-215)
  • new (257-263)
  • t (22-22)
crates/vm/src/stdlib/ctypes/field.rs (2)
  • size (259-261)
  • new (36-53)
crates/vm/src/stdlib/ctypes/util.rs (1)
  • new (31-40)
crates/vm/src/stdlib/ctypes/array.rs (3)
crates/vm/src/stdlib/ctypes.rs (4)
  • std (417-417)
  • std (448-448)
  • std (696-696)
  • size_of (391-453)
crates/vm/src/stdlib/ctypes/base.rs (11)
  • std (548-548)
  • std (556-556)
  • std (761-761)
  • std (764-764)
  • std (767-767)
  • new (257-263)
  • from_bytes (186-193)
  • buffer (1022-1022)
  • buffer (1032-1032)
  • instance (971-971)
  • size (213-215)
crates/vm/src/stdlib/ctypes/util.rs (1)
  • new (31-40)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Run tests under miri
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Ensure compilation on various targets
🔇 Additional comments (13)
crates/vm/src/stdlib/ctypes/base.rs (1)

1018-1041: LGTM: Buffer methods implementation is correct.

The use of mapped lock guards ensures safe access to the underlying buffer while maintaining proper lifetime semantics.

crates/vm/src/stdlib/ctypes/function.rs (1)

274-315: Callback/thunk creation looks correct.

The thunk lifecycle is properly managed by storing thunk_ref in the handler field, ensuring the closure remains alive while the PyCFuncPtr exists.

crates/vm/src/stdlib/ctypes/union.rs (1)

299-327: LGTM: AsBuffer implementation is consistent with PyCSimple.

The buffer methods follow the same correct pattern used in SIMPLE_BUFFER_METHODS.

crates/vm/src/stdlib/ctypes/thunk.rs (1)

175-182: LGTM: ThunkData Drop correctly reclaims leaked memory.

The Drop implementation properly reclaims the Box-leaked userdata, preventing memory leaks.

crates/vm/src/stdlib/ctypes.rs (3)

184-347: LGTM: bytes_to_pyobject correctly uses platform-specific types.

The implementation properly handles platform-dependent sizes for c_long, c_ulong, and other types by using the actual C type sizes via mem::size_of.


389-453: LGTM: size_of now properly handles struct/union types.

The function now covers struct and union type objects by delegating to calculate_struct_size and calculate_union_size, addressing the previous review concern.


825-872: Cast implementation delegates to pointer type correctly.

The _cast function creates a pointer instance and sets its contents, which is the expected pattern for ctypes casting.

crates/vm/src/stdlib/ctypes/array.rs (3)

588-597: Good: Source reference is now tracked in from_buffer.

The implementation correctly stores buffer.obj.clone() in the CDataObject, keeping the source buffer alive. This addresses the CPython semantics for from_buffer.


505-529: Unsafe pointer read with minimal validation.

The from_address method only checks for NULL pointer but cannot validate that the address points to valid, readable memory. This matches CPython's behavior where it's the caller's responsibility to ensure address validity, but it means invalid addresses will cause undefined behavior (likely segfault).

This is expected behavior for ctypes FFI. The comment documenting the safety requirements would be helpful but not required.


758-791: AsBuffer implementation looks correct.

The buffer methods properly map through the PyRwLock guards to provide zero-copy access to the underlying cdata.buffer. Setting readonly=false is appropriate for ctypes objects.

crates/vm/src/stdlib/ctypes/structure.rs (2)

414-420: Good: Source reference tracking implemented.

The from_buffer method now correctly stores Some(source) in the CDataObject, keeping the source buffer alive as per CPython semantics.


467-501: AsBuffer implementation is consistent with PyCArray.

The buffer methods correctly expose the structure's memory buffer through the lock guards, matching the pattern used in array.rs.

crates/vm/src/stdlib/ctypes/pointer.rs (1)

246-260: Let me search for libloading API documentation:
<function_calls>


libloading crate Library::get method documentation Rust

</function_calls>

Comment on lines +792 to +793
let value = if address != 0 && size > 0 {
// Safety: This is inherently unsafe - reading from arbitrary memory address
unsafe {
let ptr = address as *const u8;
let bytes = std::slice::from_raw_parts(ptr, size);
// Convert bytes to appropriate Python value based on type
bytes_to_pyobject(&cls, bytes, vm)?
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Unsafe memory read lacks alignment check.

Reading from an arbitrary address without verifying alignment can cause undefined behavior on platforms requiring aligned access. The comment acknowledges the unsafety but doesn't document alignment requirements.

Consider adding an alignment check:

         let value = if address != 0 && size > 0 {
             // Safety: This is inherently unsafe - reading from arbitrary memory address
+            // Caller must ensure address is properly aligned for the type
             unsafe {
                 let ptr = address as *const u8;
+                // Note: Reading as u8 bytes avoids alignment issues
                 let bytes = std::slice::from_raw_parts(ptr, size);
                 // Convert bytes to appropriate Python value based on type
                 bytes_to_pyobject(&cls, bytes, vm)?
             }

Alternatively, add a runtime alignment check:

let align = get_align(&type_str);
if (address as usize) % align != 0 {
    return Err(vm.new_value_error("address is not properly aligned".to_owned()));
}

Comment on lines +854 to +852
// Create instance
let instance = PyCSimple::py_new(cls.clone(), (OptionalArg::Present(value),), vm)?;

// TODO: Store reference to source in _objects to keep buffer alive
Ok(instance)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

from_buffer doesn't preserve source buffer reference.

The TODO indicates a missing implementation. The source buffer object should be stored in _objects to prevent garbage collection while the ctypes object is alive. Without this, the underlying memory could be freed while still referenced.

         // Create instance
         let instance = PyCSimple::py_new(cls.clone(), (OptionalArg::Present(value),), vm)?;

-        // TODO: Store reference to source in _objects to keep buffer alive
+        // Store reference to source buffer in _objects to keep it alive
+        if let Ok(simple_ref) = instance.clone().downcast::<PyCSimple>() {
+            simple_ref.cdata.write().objects = Some(source);
+        }
+
         Ok(instance)

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In crates/vm/src/stdlib/ctypes/base.rs around lines 854–858, the created ctypes
instance does not retain a reference to the source buffer per the TODO; to fix,
obtain or create the instance's internal _objects container (e.g., ensure
_objects is a list on the instance), append the source buffer object to that
list so the VM holds a strong reference, and then return the instance—make sure
you use the VM APIs that add a Python-level reference (not a temporary borrow)
so the buffer won't be GC'd while the ctypes object exists.

Comment on lines +135 to +177
#[pyclassmethod]
fn from_buffer(
cls: PyTypeRef,
source: PyObjectRef,
offset: crate::function::OptionalArg<isize>,
vm: &VirtualMachine,
) -> PyResult {
use crate::TryFromObject;
use crate::protocol::PyBuffer;

let offset = offset.unwrap_or(0);
if offset < 0 {
return Err(vm.new_value_error("offset cannot be negative".to_owned()));
}
let offset = offset as usize;
let size = std::mem::size_of::<usize>();

let buffer = PyBuffer::try_from_object(vm, source.clone())?;

if buffer.desc.readonly {
return Err(vm.new_type_error("underlying buffer is not writable".to_owned()));
}

let buffer_len = buffer.desc.len;
if offset + size > buffer_len {
return Err(vm.new_value_error(format!(
"Buffer size too small ({} instead of at least {} bytes)",
buffer_len,
offset + size
)));
}

// Read pointer value from buffer
let bytes = buffer.obj_bytes();
let ptr_bytes = &bytes[offset..offset + size];
let ptr_val = usize::from_ne_bytes(ptr_bytes.try_into().expect("size is checked above"));

Ok(PyCPointer {
contents: PyRwLock::new(vm.ctx.new_int(ptr_val).into()),
}
.into_ref_with_type(vm, cls)?
.into())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

from_buffer doesn't track source reference.

CPython's from_buffer keeps the source buffer alive by storing a reference. This implementation reads the data but doesn't store a reference to source, which means the source could be garbage collected while the pointer still references memory from it.

Since PyCPointer only stores the pointer value (not a view into the buffer), this is less critical than for arrays/structures. However, for consistency with CPython semantics, consider whether source tracking is needed.

🤖 Prompt for AI Agents
In crates/vm/src/stdlib/ctypes/pointer.rs around lines 135-177, from_buffer
reads a pointer value from the given buffer but does not retain a reference to
the source object so the underlying buffer may be freed; update PyCPointer to
hold an owner/source PyObjectRef (or re-use existing ownership slot if present),
set that owner to source.clone() when constructing the PyCPointer in
from_buffer, and ensure the struct constructors, into_ref_with_type path and any
Drop/clone semantics preserve that reference so the buffer remains alive for the
lifetime of the PyCPointer instance.

Comment on lines +140 to +156
/// Get the alignment of a ctypes type
fn get_field_align(field_type: &PyObjectRef, vm: &VirtualMachine) -> usize {
// Try to get _type_ attribute for simple types
if let Some(align) = field_type
.get_attr("_type_", vm)
.ok()
.and_then(|type_attr| type_attr.str(vm).ok())
.and_then(|type_str| {
let s = type_str.to_string();
(s.len() == 1).then(|| get_size(&s)) // alignment == size for simple types
})
{
return align;
}
// Default alignment
1
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Alignment calculation may be incomplete for complex types.

The get_field_align helper only handles simple types (where alignment equals size). For nested structures, arrays, or pointers, the default alignment of 1 may be incorrect and could cause misaligned field access.

Consider extending this to handle:

  • Pointer types (alignment = std::mem::align_of::<*const ()>())
  • Nested structures (use their maximum field alignment)
  • Arrays (use element alignment)
     fn get_field_align(field_type: &PyObjectRef, vm: &VirtualMachine) -> usize {
         // Try to get _type_ attribute for simple types
         if let Some(align) = field_type
             .get_attr("_type_", vm)
             .ok()
             .and_then(|type_attr| type_attr.str(vm).ok())
             .and_then(|type_str| {
                 let s = type_str.to_string();
                 (s.len() == 1).then(|| get_size(&s)) // alignment == size for simple types
             })
         {
             return align;
         }
+        // Pointer types have pointer alignment
+        if field_type.class().fast_issubclass(super::pointer::PyCPointer::static_type()) {
+            return std::mem::align_of::<*const ()>();
+        }
         // Default alignment
         1
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// Get the alignment of a ctypes type
fn get_field_align(field_type: &PyObjectRef, vm: &VirtualMachine) -> usize {
// Try to get _type_ attribute for simple types
if let Some(align) = field_type
.get_attr("_type_", vm)
.ok()
.and_then(|type_attr| type_attr.str(vm).ok())
.and_then(|type_str| {
let s = type_str.to_string();
(s.len() == 1).then(|| get_size(&s)) // alignment == size for simple types
})
{
return align;
}
// Default alignment
1
}
/// Get the alignment of a ctypes type
fn get_field_align(field_type: &PyObjectRef, vm: &VirtualMachine) -> usize {
// Try to get _type_ attribute for simple types
if let Some(align) = field_type
.get_attr("_type_", vm)
.ok()
.and_then(|type_attr| type_attr.str(vm).ok())
.and_then(|type_str| {
let s = type_str.to_string();
(s.len() == 1).then(|| get_size(&s)) // alignment == size for simple types
})
{
return align;
}
// Pointer types have pointer alignment
if field_type.class().fast_issubclass(super::pointer::PyCPointer::static_type()) {
return std::mem::align_of::<*const ()>();
}
// Default alignment
1
}

Comment on lines 334 to 335
#[pyclass(flags(BASETYPE, IMMUTABLETYPE), with(Constructor))]
impl PyCStructure {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

AsBuffer trait implemented but not declared in with() clause.

The AsBuffer trait is implemented for PyCStructure (lines 492-501), but it's not listed in the #[pyclass] attribute's with() clause on line 334. Compare with PyCArray which has with(Constructor, AsSequence, AsBuffer).

-#[pyclass(flags(BASETYPE, IMMUTABLETYPE), with(Constructor))]
+#[pyclass(flags(BASETYPE, IMMUTABLETYPE), with(Constructor, AsBuffer))]
 impl PyCStructure {

Without this, the buffer protocol may not be properly exposed to Python.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#[pyclass(flags(BASETYPE, IMMUTABLETYPE), with(Constructor))]
impl PyCStructure {
#[pyclass(flags(BASETYPE, IMMUTABLETYPE), with(Constructor, AsBuffer))]
impl PyCStructure {
🤖 Prompt for AI Agents
In crates/vm/src/stdlib/ctypes/structure.rs around lines 334-335, the
PyCStructure implements the AsBuffer trait but the #[pyclass(..., with(...))]
attribute only lists Constructor; update the with() clause to include AsBuffer
(e.g., with(Constructor, AsBuffer)) to expose the buffer protocol, and ensure
the AsBuffer trait is in scope/imported similarly to the PyCArray example.

Comment on lines +306 to +311
// SAFETY: PyCThunk is safe to send/sync because:
// - callable is a PyObjectRef which is Send+Sync
// - thunk_data contains the libffi closure which is heap-allocated
// - code_ptr is just a pointer to executable memory
unsafe impl Send for PyCThunk {}
unsafe impl Sync for PyCThunk {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

libffi-rs Closure thread safe Send Sync documentation

💡 Result:

Short answer: libffi-rs's Closure is NOT thread‑safe — it is neither Send nor Sync. [1]

Why: the Closure type ties a userdata reference lifetime to the closure and exposes a raw C-callable code pointer; the docs show auto‑trait implementations include impl<'a> !Send and impl<'a> !Sync. Use the Closure only on the thread where its userdata and lifetime guarantees hold, or wrap/transfer safety yourself. [1][2]

Sources:

  • Closure docs (auto-traits: !Send, !Sync). [1]
  • libffi-rs crate overview (closure/userdata lifetimes and usage). [2]

🏁 Script executed:

# First, let's examine the thunk.rs file to understand the structure of PyCThunk
cd crates/vm/src/stdlib/ctypes || exit 1
wc -l thunk.rs

Repository: RustPython/RustPython

Length of output: 186


🌐 Web query:

libffi-rs Closure lifetime userdata thread safety constraints

💡 Result:

Short summary — constraints when using libffi-rs closures (lifetimes, userdata, thread-safety):

  • Ownership / lifetime

    • The Closure value (the Rust-side object) must live as long as C may call the generated function pointer. If the Closure is dropped while C still holds the function pointer+user_data, calls are UB. Keep the Closure alive (e.g. store it or leak it via Box::into_raw) for the callback lifetime. [1][2]
  • user_data pointer

    • The userdata passed to C is just a raw pointer; it must point to memory with a stable address for as long as C may use it. Typical pattern: Box the Rust state and pass Box::into_raw (and later reconstruct with Box::from_raw to free). Don’t pass pointers to stack values that may be moved/dropped. [1][2]
  • Thread-safety / Send+Sync

    • If C may call the callback on other threads you must ensure the closure/state types meet the required thread-safety. Use closure types that include Send and/or Sync bounds (the crate exposes ways to require Send/Sync on the closure trait object). The caller is responsible for ensuring the actual function pointer + user_data are thread-safe when calling from other threads. [1][3]
  • Reentrancy and mutability

    • If the callback needs interior mutability or will be called reentrantly/ concurrently, use synchronization (Mutex, atomic types) or require Fn + Sync/Send as appropriate. FnMut callbacks are not inherently thread-safe unless you enforce it. [1][7]
  • Safety notes

    • Creating closures for FFI involves unsafe invariants: ensuring proper lifetime, stable address for userdata, and correct Send/Sync guarantees. Follow the crate’s examples and documented safety comments. [1][2]

Sources:

  • libffi-rs documentation (closures, examples, safety notes). [1]
  • libffi-sys / libffi crate pages (low-level userdata and build/docs). [2]
  • ffi_closure crate docs (illustrates explicit Send/Sync closure options and typical patterns). [3]

If you want, tell me which exact libffi-rs API (ClosureN / high::ClosureX / an example) you’re using and whether callbacks may be called from other threads — I’ll give a concrete safe pattern (code) for that case.


🌐 Web query:

RustPython PyCThunk ctypes thread safety implementation

💡 Result:

Do you want (A) a short explanation of how RustPython currently implements ctypes / PyCThunk and its thread-safety status, (B) pointers to the RustPython source files to inspect/modify the PyCThunk/ctypes implementation, or (C) guidance on implementing thread-safe CFUNCTYPE-style callbacks (PyCThunk) in RustPython?


Perfect. The additional context confirms the issue. The libffi-rs documentation explicitly states that the caller is responsible for ensuring thread-safety, and if callbacks may be called from other threads, proper Send+Sync bounds must be enforced. Since Closure is !Send and !Sync, this responsibility cannot be delegated via an unsafe blanket impl.

The rewritten review comment I provided is accurate and stands as-is:


Fix unsafe Send+Sync impl for PyCThunk — libffi Closure is not thread-safe.

The safety comment is incorrect. libffi-rs::middle::Closure explicitly implements !Send and !Sync (negative trait bounds) because it ties a userdata reference lifetime to the closure with thread-safety constraints. Simply being heap-allocated does not guarantee thread safety.

The current unsafe impl violates these constraints:

// SAFETY: PyCThunk is safe to send/sync because:
// - callable is a PyObjectRef which is Send+Sync
// - thunk_data contains the libffi closure which is heap-allocated  ❌ INCORRECT
// - code_ptr is just a pointer to executable memory
unsafe impl Send for PyCThunk {}
unsafe impl Sync for PyCThunk {}

Either remove these implementations, wrap thunk_data in thread-safe synchronization (e.g., Mutex), or provide a valid safety argument with proper analysis of the Closure's userdata and lifetime guarantees.

🤖 Prompt for AI Agents
In crates/vm/src/stdlib/ctypes/thunk.rs around lines 306-311, the unsafe blanket
impls marking PyCThunk as Send and Sync are incorrect because
libffi-rs::middle::Closure is explicitly !Send and !Sync; remove these unsafe
impls or make thunk_data properly thread-safe (e.g., wrap the Closure and any
userdata in a Mutex/Arc<Mutex<...>> or equivalent synchronization) or else
provide a detailed, correct safety justification that proves the Closure's
userdata and lifetimes make cross-thread access safe; implement one of these
fixes and update the safety comment to match the chosen approach.

Comment on lines +250 to +258
// Copy data from source buffer
let bytes = buffer.obj_bytes();
let data = bytes[offset..offset + size].to_vec();

Ok(PyCUnion {
cdata: PyRwLock::new(CDataObject::from_bytes(data, None)),
}
.into_ref_with_type(vm, cls)?
.into())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

from_buffer doesn't preserve source buffer reference.

Similar to PyCSimple::from_buffer, the source buffer object should be stored to prevent it from being garbage collected while the union references its memory.

         // Copy data from source buffer
         let bytes = buffer.obj_bytes();
         let data = bytes[offset..offset + size].to_vec();

         Ok(PyCUnion {
-            cdata: PyRwLock::new(CDataObject::from_bytes(data, None)),
+            cdata: PyRwLock::new(CDataObject::from_bytes(data, Some(source))),
         }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In crates/vm/src/stdlib/ctypes/union.rs around lines 250 to 258, the new
CDataObject is created with from_bytes(..., None) which drops the source buffer
reference and allows it to be GC'd while the union still references its memory;
change the construction to preserve the source buffer by passing a reference (or
cloned PyObject) into the CDataObject (e.g., call from_bytes(data,
Some(buffer.clone())) or assign the buffer to the CDataObject/PyCUnion field
that holds the original buffer) so the buffer stays alive for the lifetime of
the union.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/vm/src/stdlib/ctypes/base.rs (1)

25-48: Hardcoded c_long/c_ulong as 32-bit in ffi_type_from_str is incorrect on 64-bit platforms.

Lines 35-36 map "l" and "L" to i32/u32, assuming 4-byte c_long/c_ulong. However, on 64-bit Unix platforms (Linux, macOS), c_long and c_ulong are 8 bytes.

Consider using platform-aware sizing:

         "I" => Some(libffi::middle::Type::u32()),
-        "l" => Some(libffi::middle::Type::i32()),
-        "L" => Some(libffi::middle::Type::u32()),
+        "l" => {
+            if std::mem::size_of::<libc::c_long>() == 4 {
+                Some(libffi::middle::Type::i32())
+            } else {
+                Some(libffi::middle::Type::i64())
+            }
+        }
+        "L" => {
+            if std::mem::size_of::<libc::c_ulong>() == 4 {
+                Some(libffi::middle::Type::u32())
+            } else {
+                Some(libffi::middle::Type::u64())
+            }
+        }
         "q" => Some(libffi::middle::Type::i64()),
♻️ Duplicate comments (13)
crates/vm/src/stdlib/ctypes/pointer.rs (1)

135-177: from_buffer doesn't retain source buffer reference.

While this is less critical for pointers (since only the pointer value—an integer—is copied), CPython's from_buffer semantics keep the source buffer alive. For consistency with PyCStructure and PyCUnion, consider storing the source reference.

         Ok(PyCPointer {
-            contents: PyRwLock::new(vm.ctx.new_int(ptr_val).into()),
+            contents: PyRwLock::new(vm.ctx.new_int(ptr_val).into()),
+            // Consider adding: source: PyRwLock<Option<PyObjectRef>>,
+            // and storing source.clone() here
         }
crates/vm/src/stdlib/ctypes/structure.rs (3)

364-370: from_address creates structure with empty fields map.

The structure is created without populating the fields map from the class's _fields_ attribute. If code relies on self.fields for field access, it may not work correctly. The same issue exists in from_buffer and from_buffer_copy.

Consider extracting field parsing to a shared helper and using it in all three methods:

// Helper to populate fields from class _fields_
fn populate_fields_from_class(cls: &PyTypeRef, vm: &VirtualMachine) -> PyResult<IndexMap<String, FieldInfo>> {
    // Same logic as in py_new
}

334-334: AsBuffer trait not declared in with() clause.

The AsBuffer trait is implemented for PyCStructure (lines 492-502), but it's not listed in the #[pyclass] attribute's with() clause. Without this, the buffer protocol may not be properly exposed to Python.

-#[pyclass(flags(BASETYPE, IMMUTABLETYPE), with(Constructor))]
+#[pyclass(flags(BASETYPE, IMMUTABLETYPE), with(Constructor, AsBuffer))]
 impl PyCStructure {

140-156: Alignment calculation incomplete for pointer and composite types.

The helper only handles simple types. For pointers, arrays, and nested structures, it returns 1, which may cause incorrect field layout.

Consider extending to handle more types:

     fn get_field_align(field_type: &PyObjectRef, vm: &VirtualMachine) -> usize {
         // Try to get _type_ attribute for simple types
         if let Some(align) = field_type
             .get_attr("_type_", vm)
             .ok()
             .and_then(|type_attr| type_attr.str(vm).ok())
             .and_then(|type_str| {
                 let s = type_str.to_string();
                 (s.len() == 1).then(|| get_size(&s))
             })
         {
             return align;
         }
+        // Pointer types have pointer alignment
+        if field_type.class().fast_issubclass(super::pointer::PyCPointer::static_type()) {
+            return std::mem::align_of::<*const ()>();
+        }
         // Default alignment
         1
     }
crates/vm/src/stdlib/ctypes/function.rs (1)

111-112: Platform mismatch: c_long/c_ulong return types read as 8 bytes on all platforms.

Lines 111-112 unconditionally read l/L types as i64/u64, but on platforms where c_long is 4 bytes (Windows, 32-bit systems), this reads beyond the allocated result buffer, causing undefined behavior.

Apply this diff to handle platform-specific sizes:

-            Some("l") => vm.ctx.new_int(unsafe { *(value as *const i64) }).into(),
-            Some("L") => vm.ctx.new_int(unsafe { *(value as *const u64) }).into(),
+            Some("l") => {
+                if std::mem::size_of::<std::ffi::c_long>() == 4 {
+                    vm.ctx.new_int(unsafe { *(value as *const i32) }).into()
+                } else {
+                    vm.ctx.new_int(unsafe { *(value as *const i64) }).into()
+                }
+            }
+            Some("L") => {
+                if std::mem::size_of::<std::ffi::c_ulong>() == 4 {
+                    vm.ctx.new_int(unsafe { *(value as *const u32) }).into()
+                } else {
+                    vm.ctx.new_int(unsafe { *(value as *const u64) }).into()
+                }
+            }
crates/vm/src/stdlib/ctypes/union.rs (3)

197-214: from_address creates zero-initialized buffer instead of reading from address.

The implementation creates an empty CDataObject from stg_info rather than reading the data from the provided memory address. This differs from PyCStructure::from_address which reads actual bytes from the address.

Apply this diff to read data from the address:

     #[pyclassmethod]
     fn from_address(cls: PyTypeRef, address: isize, vm: &VirtualMachine) -> PyResult {
         use crate::stdlib::ctypes::_ctypes::size_of;

         // Get size from cls
         let size = size_of(cls.clone().into(), vm)?;

         // Create instance with data from address
         if address == 0 || size == 0 {
             return Err(vm.new_value_error("NULL pointer access".to_owned()));
         }
-        let stg_info = StgInfo::new(size, 1);
+
+        // SAFETY: Caller must ensure address points to valid memory of at least size bytes
+        let data = unsafe {
+            let ptr = address as *const u8;
+            std::slice::from_raw_parts(ptr, size).to_vec()
+        };
+
         Ok(PyCUnion {
-            cdata: PyRwLock::new(CDataObject::from_stg_info(&stg_info)),
+            cdata: PyRwLock::new(CDataObject::from_bytes(data, None)),
         }
         .into_ref_with_type(vm, cls)?
         .into())
     }

173-177: Alignment calculation incorrectly uses field size instead of field alignment.

Setting max_align = max_align.max(size) assumes alignment equals size, which is incorrect for:

  • Arrays (alignment = element alignment, not total array size)
  • Nested structures (alignment = max field alignment)

Use proper alignment calculation:

                 let field_type = field_tuple.get(1).unwrap().clone();
                 let size = PyCUnionType::get_field_size(&field_type, vm)?;
                 max_size = max_size.max(size);
-                // For simple types, alignment == size
-                max_align = max_align.max(size);
+                // Get proper alignment for the field type
+                let align = crate::stdlib::ctypes::_ctypes::alignment(
+                    crate::function::Either::B(field_type),
+                    vm,
+                ).unwrap_or(1);
+                max_align = max_align.max(align);

250-259: from_buffer doesn't preserve source buffer reference.

The source buffer should be stored to prevent garbage collection while the union references its memory. Pass Some(source) instead of None.

         // Copy data from source buffer
         let bytes = buffer.obj_bytes();
         let data = bytes[offset..offset + size].to_vec();

         Ok(PyCUnion {
-            cdata: PyRwLock::new(CDataObject::from_bytes(data, None)),
+            cdata: PyRwLock::new(CDataObject::from_bytes(data, Some(source))),
         }
crates/vm/src/stdlib/ctypes/thunk.rs (3)

60-64: Exception in callable leaves FFI result uninitialized.

When the Python callable raises an exception, python_to_ffi returns early without writing to result. The C caller will receive whatever garbage was in the result buffer.

Consider zero-initializing the result on error:

     let obj = match obj {
         Ok(o) => o,
-        Err(_) => return, // Exception occurred, leave result as-is
+        Err(_) => {
+            // Zero-initialize result on error
+            unsafe { std::ptr::write_bytes(result, 0, 1); }
+            return;
+        }
     };

Note: CPython sets an exception flag that the caller should check; this may be acceptable if RustPython has similar semantics.


105-113: Platform mismatch in python_to_ffi for c_long/c_ulong.

Lines 105-113 write l/L as i64/u64, but on platforms where c_long is 4 bytes (32-bit or Windows LLP64), this writes 8 bytes to a 4-byte buffer, causing memory corruption.

-            Some("l") | Some("q") => {
+            Some("l") => {
+                if let Ok(i) = obj.try_int(vm) {
+                    const SIZE: usize = std::mem::size_of::<libc::c_long>();
+                    if SIZE == 4 {
+                        *(result as *mut i32) = i.as_bigint().to_i32().unwrap_or(0);
+                    } else {
+                        *(result as *mut i64) = i.as_bigint().to_i64().unwrap_or(0);
+                    }
+                }
+            }
+            Some("q") => {
                 if let Ok(i) = obj.try_int(vm) {
                     *(result as *mut i64) = i.as_bigint().to_i64().unwrap_or(0);
                 }
             }
-            Some("L") | Some("Q") => {
+            Some("L") => {
+                if let Ok(i) = obj.try_int(vm) {
+                    const SIZE: usize = std::mem::size_of::<libc::c_ulong>();
+                    if SIZE == 4 {
+                        *(result as *mut u32) = i.as_bigint().to_u32().unwrap_or(0);
+                    } else {
+                        *(result as *mut u64) = i.as_bigint().to_u64().unwrap_or(0);
+                    }
+                }
+            }
+            Some("Q") => {
                 if let Ok(i) = obj.try_int(vm) {
                     *(result as *mut u64) = i.as_bigint().to_u64().unwrap_or(0);
                 }
             }

306-311: Fix unsafe Send+Sync impl for PyCThunk — libffi Closure is not thread-safe.

The safety comment is incorrect. libffi-rs::middle::Closure explicitly implements !Send and !Sync (negative trait bounds) because it ties a userdata reference lifetime to the closure with thread-safety constraints. Simply being heap-allocated does not guarantee thread safety.

The current unsafe impl violates these constraints:

// SAFETY: PyCThunk is safe to send/sync because:
// - callable is a PyObjectRef which is Send+Sync
// - thunk_data contains the libffi closure which is heap-allocated  ❌ INCORRECT
// - code_ptr is just a pointer to executable memory
unsafe impl Send for PyCThunk {}
unsafe impl Sync for PyCThunk {}

Either remove these implementations, wrap thunk_data in thread-safe synchronization (e.g., Mutex), or provide a valid safety argument with proper analysis of the Closure's userdata and lifetime guarantees.

crates/vm/src/stdlib/ctypes/base.rs (1)

848-852: from_buffer doesn't preserve source buffer reference.

The TODO indicates a missing implementation. The source buffer object should be stored in the instance's cdata.objects to prevent garbage collection while the ctypes object is alive. Without this, the underlying memory could be freed while still referenced.

         // Create instance
         let instance = PyCSimple::py_new(cls.clone(), (OptionalArg::Present(value),), vm)?;
 
-        // TODO: Store reference to source in _objects to keep buffer alive
+        // Store reference to source buffer to keep it alive
+        if let Ok(simple_ref) = instance.clone().downcast::<PyCSimple>() {
+            simple_ref.cdata.write().objects = Some(source);
+        }
+
         Ok(instance)
crates/vm/src/stdlib/ctypes/array.rs (1)

750-756: Critical: to_arg creates a dangling pointer to a dropped temporary.

Arg::new(&cdata.buffer) borrows a reference to the buffer, but the RwLockReadGuard (cdata) is dropped at the end of the statement. The returned Arg will hold a dangling pointer to the freed buffer.

Consider returning the buffer alongside the Arg to keep it alive, or restructuring to ensure the buffer outlives the Arg:

 impl PyCArray {
     #[allow(unused)]
-    pub fn to_arg(&self, _vm: &VirtualMachine) -> PyResult<libffi::middle::Arg> {
-        let cdata = self.cdata.read();
-        Ok(libffi::middle::Arg::new(&cdata.buffer))
+    pub fn to_arg(&self, _vm: &VirtualMachine) -> PyResult<(Vec<u8>, libffi::middle::Arg)> {
+        let buffer = self.cdata.read().buffer.clone();
+        let arg = libffi::middle::Arg::new(&buffer);
+        Ok((buffer, arg))
     }
 }

Note: The caller must keep the returned buffer alive for the duration of the FFI call.

🧹 Nitpick comments (1)
crates/vm/src/stdlib/ctypes/structure.rs (1)

467-490: Double lock guard mapping could be simplified.

The obj_bytes and obj_bytes_mut callbacks use nested map calls which could be simplified to a single mapping, similar to how PyCUnion does it.

 static STRUCTURE_BUFFER_METHODS: BufferMethods = BufferMethods {
     obj_bytes: |buffer| {
-        rustpython_common::lock::PyMappedRwLockReadGuard::map(
-            rustpython_common::lock::PyRwLockReadGuard::map(
-                buffer.obj_as::<PyCStructure>().cdata.read(),
-                |x: &CDataObject| x,
-            ),
+        rustpython_common::lock::PyRwLockReadGuard::map(
+            buffer.obj_as::<PyCStructure>().cdata.read(),
             |x: &CDataObject| x.buffer.as_slice(),
         )
         .into()
     },
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8933738 and b92d266.

📒 Files selected for processing (9)
  • crates/vm/src/stdlib/ctypes.rs (7 hunks)
  • crates/vm/src/stdlib/ctypes/array.rs (13 hunks)
  • crates/vm/src/stdlib/ctypes/base.rs (10 hunks)
  • crates/vm/src/stdlib/ctypes/function.rs (5 hunks)
  • crates/vm/src/stdlib/ctypes/pointer.rs (3 hunks)
  • crates/vm/src/stdlib/ctypes/structure.rs (9 hunks)
  • crates/vm/src/stdlib/ctypes/thunk.rs (1 hunks)
  • crates/vm/src/stdlib/ctypes/union.rs (3 hunks)
  • crates/vm/src/stdlib/ctypes/util.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/vm/src/stdlib/ctypes/util.rs
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style in Rust code (cargo fmt to format)
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by your changes
Follow Rust best practices for error handling and memory management

Files:

  • crates/vm/src/stdlib/ctypes/structure.rs
  • crates/vm/src/stdlib/ctypes/function.rs
  • crates/vm/src/stdlib/ctypes/array.rs
  • crates/vm/src/stdlib/ctypes/thunk.rs
  • crates/vm/src/stdlib/ctypes/pointer.rs
  • crates/vm/src/stdlib/ctypes.rs
  • crates/vm/src/stdlib/ctypes/union.rs
  • crates/vm/src/stdlib/ctypes/base.rs
**/src/**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/vm/src/stdlib/ctypes/structure.rs
  • crates/vm/src/stdlib/ctypes/function.rs
  • crates/vm/src/stdlib/ctypes/array.rs
  • crates/vm/src/stdlib/ctypes/thunk.rs
  • crates/vm/src/stdlib/ctypes/pointer.rs
  • crates/vm/src/stdlib/ctypes.rs
  • crates/vm/src/stdlib/ctypes/union.rs
  • crates/vm/src/stdlib/ctypes/base.rs
🧠 Learnings (5)
📚 Learning: 2025-11-10T06:27:41.954Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6243
File: vm/src/function/buffer.rs:123-129
Timestamp: 2025-11-10T06:27:41.954Z
Learning: In vm/src/function/buffer.rs, line 36 in the try_rw_bytes_like method intentionally uses TypeError (not BufferError) for the "buffer is not a read-write bytes-like object" error case, even though a similar error message in ArgMemoryBuffer::try_from_borrowed_object uses BufferError. The TypeError is the intended exception type for this specific code path.

Applied to files:

  • crates/vm/src/stdlib/ctypes/structure.rs
  • crates/vm/src/stdlib/ctypes/array.rs
  • crates/vm/src/stdlib/ctypes/thunk.rs
  • crates/vm/src/stdlib/ctypes/union.rs
  • crates/vm/src/stdlib/ctypes/base.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.

Applied to files:

  • crates/vm/src/stdlib/ctypes/function.rs
  • crates/vm/src/stdlib/ctypes/thunk.rs
  • crates/vm/src/stdlib/ctypes.rs
  • crates/vm/src/stdlib/ctypes/base.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.

Applied to files:

  • crates/vm/src/stdlib/ctypes/function.rs
  • crates/vm/src/stdlib/ctypes.rs
  • crates/vm/src/stdlib/ctypes/base.rs
📚 Learning: 2025-11-25T11:05:02.899Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T11:05:02.899Z
Learning: Applies to **/src/**/*.rs : Use the macro system (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust

Applied to files:

  • crates/vm/src/stdlib/ctypes/function.rs
  • crates/vm/src/stdlib/ctypes.rs
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In the RustPython codebase, only certain builtin types should be marked with the SEQUENCE flag for pattern matching. List and tuple are sequences, but bytes, bytearray, and range are not considered sequences in this context, even though they may implement sequence-like protocols.

Applied to files:

  • crates/vm/src/stdlib/ctypes/array.rs
🧬 Code graph analysis (3)
crates/vm/src/stdlib/ctypes/function.rs (1)
crates/vm/src/stdlib/ctypes/thunk.rs (5)
  • t (31-31)
  • new (213-298)
  • code_ptr (301-303)
  • args (225-225)
  • item (226-226)
crates/vm/src/stdlib/ctypes/thunk.rs (1)
crates/vm/src/stdlib/ctypes/base.rs (8)
  • std (542-542)
  • std (550-550)
  • std (755-755)
  • std (758-758)
  • std (761-761)
  • ffi_type_from_str (25-48)
  • get_type_code (18-23)
  • t (22-22)
crates/vm/src/stdlib/ctypes/pointer.rs (2)
crates/vm/src/stdlib/ctypes/array.rs (1)
  • __mul__ (96-121)
crates/vm/src/stdlib/ctypes/util.rs (1)
  • new (31-40)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Run tests under miri
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Ensure compilation on various targets
🔇 Additional comments (9)
crates/vm/src/stdlib/ctypes/function.rs (2)

274-315: LGTM: Callback creation path correctly manages thunk lifetime.

The implementation properly:

  • Creates a PyCThunk wrapper for the Python callable
  • Captures the code pointer before converting to PyRef
  • Stores the thunk reference in handler to keep it alive for the duration of the PyCFuncPtr

341-370: Good defensive handling with ok() for optional downcasts.

The use of .ok() instead of .unwrap() for the downcast::<PyType>() calls properly handles cases where the return type might not be a PyType, falling back gracefully to Type::i32 or the default integer conversion.

crates/vm/src/stdlib/ctypes.rs (2)

163-166: Alignment assumption is correct for simple C types.

For the simple type characters handled by ctypes (c, b, h, i, l, etc.), the C ABI guarantees that alignment equals size. This is the standard behavior.


184-347: Comprehensive bytes-to-Python conversion with proper platform handling.

The implementation correctly uses mem::size_of for each C type, ensuring platform-appropriate byte interpretation. The WideChar handling properly distinguishes between 2-byte and 4-byte wide characters based on platform.

crates/vm/src/stdlib/ctypes/pointer.rs (3)

24-41: Good: __mul__ properly creates array type with stg_info.

The implementation correctly:

  • Calculates total size and element size for pointers
  • Creates StgInfo with proper length
  • Constructs PyCArrayType with all required fields

122-133: Correct implementation of from_address for pointer type.

The method correctly stores the address value without dereferencing it, which matches the expected semantics for pointer types. The NULL check prevents creating pointers to address 0.


215-268: Good implementation of in_dll for loading symbols from DLLs.

The method correctly:

  • Handles both integer handles and objects with _handle attribute
  • Uses the library cache for lookup
  • Properly null-terminates the symbol name
  • Returns the symbol address in a new pointer instance
crates/vm/src/stdlib/ctypes/union.rs (1)

299-328: AsBuffer implementation correctly exposes union memory buffer.

The implementation properly:

  • Uses lock guards for thread-safe buffer access
  • Marks buffer as writable (readonly=false)
  • Follows the same pattern as PyCStructure
crates/vm/src/stdlib/ctypes/structure.rs (1)

414-420: Good: from_buffer now preserves source buffer reference.

The implementation correctly passes Some(source) to CDataObject::from_bytes, keeping the source buffer alive while the structure references its data.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (13)
crates/vm/src/stdlib/ctypes/function.rs (1)

111-112: Platform mismatch: c_long/c_ulong return types read as 8 bytes.

Lines 111-112 always read l/L types as i64/u64, but on platforms where c_long is 4 bytes (Windows, 32-bit systems), this reads beyond the allocated result buffer.

Apply this diff to fix the platform-dependent size:

-            Some("l") => vm.ctx.new_int(unsafe { *(value as *const i64) }).into(),
-            Some("L") => vm.ctx.new_int(unsafe { *(value as *const u64) }).into(),
+            Some("l") => {
+                if std::mem::size_of::<libc::c_long>() == 4 {
+                    vm.ctx.new_int(unsafe { *(value as *const i32) }).into()
+                } else {
+                    vm.ctx.new_int(unsafe { *(value as *const i64) }).into()
+                }
+            }
+            Some("L") => {
+                if std::mem::size_of::<libc::c_ulong>() == 4 {
+                    vm.ctx.new_int(unsafe { *(value as *const u32) }).into()
+                } else {
+                    vm.ctx.new_int(unsafe { *(value as *const u64) }).into()
+                }
+            }
crates/vm/src/stdlib/ctypes/array.rs (1)

750-756: Critical: to_arg creates a dangling pointer to borrowed data.

libffi::middle::Arg::new(&cdata.buffer) borrows a reference to the buffer through the read guard. When the guard is dropped at the end of the function, the Arg will hold a dangling pointer.

The returned Arg must not outlive the buffer. Consider returning the owned data alongside the Arg:

 impl PyCArray {
     #[allow(unused)]
-    pub fn to_arg(&self, _vm: &VirtualMachine) -> PyResult<libffi::middle::Arg> {
-        let cdata = self.cdata.read();
-        Ok(libffi::middle::Arg::new(&cdata.buffer))
+    pub fn to_arg(&self, _vm: &VirtualMachine) -> PyResult<(Vec<u8>, libffi::middle::Arg)> {
+        let buffer = self.cdata.read().buffer.clone();
+        // SAFETY: Caller must ensure `buffer` outlives the FFI call
+        let arg = unsafe { libffi::middle::Arg::new(&buffer) };
+        Ok((buffer, arg))
     }
 }

Alternatively, restructure the call site to hold the guard alive during the FFI call.

crates/vm/src/stdlib/ctypes/union.rs (3)

173-177: Alignment calculation assumes alignment equals size.

Setting max_align = max_align.max(size) assumes every field's alignment equals its size. This isn't true for composite types like arrays (where alignment is the element's alignment, not the array size) or structures with padding.

Consider computing proper field alignment:

                 let field_type = field_tuple.get(1).unwrap().clone();
                 let size = PyCUnionType::get_field_size(&field_type, vm)?;
                 max_size = max_size.max(size);
-                // For simple types, alignment == size
-                max_align = max_align.max(size);
+                // Get actual alignment (may differ from size for arrays/structs)
+                let align = super::_ctypes::alignment(
+                    crate::function::Either::B(field_type),
+                    vm
+                ).unwrap_or(1);
+                max_align = max_align.max(align);

197-214: from_address doesn't read data from the address.

The implementation creates a zero-initialized buffer via CDataObject::from_stg_info but never reads the actual data at the provided address. This defeats the purpose of from_address.

Apply this diff to actually read from the address:

     #[pyclassmethod]
     fn from_address(cls: PyTypeRef, address: isize, vm: &VirtualMachine) -> PyResult {
         use crate::stdlib::ctypes::_ctypes::size_of;

         let size = size_of(cls.clone().into(), vm)?;

         if address == 0 || size == 0 {
             return Err(vm.new_value_error("NULL pointer access".to_owned()));
         }
-        let stg_info = StgInfo::new(size, 1);
+
+        // SAFETY: Caller must ensure `address` points to valid memory of at least `size` bytes
+        let data = unsafe {
+            let ptr = address as *const u8;
+            std::slice::from_raw_parts(ptr, size).to_vec()
+        };
+
         Ok(PyCUnion {
-            cdata: PyRwLock::new(CDataObject::from_stg_info(&stg_info)),
+            cdata: PyRwLock::new(CDataObject::from_bytes(data, None)),
         }
         .into_ref_with_type(vm, cls)?
         .into())
     }

250-258: from_buffer doesn't preserve source buffer reference.

Passing None to CDataObject::from_bytes means the source buffer object isn't retained. If the source is garbage collected, the union's view into that memory becomes invalid.

         // Copy data from source buffer
         let bytes = buffer.obj_bytes();
         let data = bytes[offset..offset + size].to_vec();

         Ok(PyCUnion {
-            cdata: PyRwLock::new(CDataObject::from_bytes(data, None)),
+            cdata: PyRwLock::new(CDataObject::from_bytes(data, Some(source))),
         }
         .into_ref_with_type(vm, cls)?
         .into())
crates/vm/src/stdlib/ctypes.rs (1)

455-477: calculate_struct_size doesn't account for field alignment and padding.

The function sums field sizes without considering C structure layout rules. In C, fields are aligned to their natural alignment, and the structure is padded to its largest alignment. For example, {char, int} would be 8 bytes on most platforms, not 5.

Consider alignment-aware layout:

 fn calculate_struct_size(
     cls: &crate::builtins::PyTypeRef,
     vm: &VirtualMachine,
 ) -> PyResult<usize> {
     use crate::AsObject;

     if let Ok(fields_attr) = cls.as_object().get_attr("_fields_", vm) {
         let fields: Vec<PyObjectRef> = fields_attr.try_to_value(vm).unwrap_or_default();
-        let mut total_size = 0usize;
+        let mut offset = 0usize;
+        let mut max_align = 1usize;

         for field in fields.iter() {
             if let Some(tuple) = field.downcast_ref::<crate::builtins::PyTuple>()
                 && let Some(field_type) = tuple.get(1)
             {
-                total_size += size_of(field_type.clone(), vm)?;
+                let field_size = size_of(field_type.clone(), vm)?;
+                let field_align = alignment(Either::A(field_type.clone().downcast().unwrap_or(vm.ctx.types.object_type.clone())), vm).unwrap_or(1);
+                max_align = max_align.max(field_align);
+                offset = (offset + field_align - 1) & !(field_align - 1);
+                offset += field_size;
             }
         }
-        return Ok(total_size);
+        offset = (offset + max_align - 1) & !(max_align - 1);
+        return Ok(offset);
     }
     Ok(0)
 }
crates/vm/src/stdlib/ctypes/pointer.rs (1)

135-177: Consider tracking source buffer reference for CPython compatibility.

The current implementation reads the pointer value from the buffer but doesn't retain a reference to the source object. While PyCPointer stores just the pointer value (not a view), CPython's from_buffer typically keeps the source alive to prevent premature garbage collection.

Since PyCPointer doesn't have a cdata field with an objects slot (unlike PyCStructure/PyCSimple), consider either:

  1. Adding an optional source: PyRwLock<Option<PyObjectRef>> field to PyCPointer
  2. Documenting why source tracking is unnecessary for pointer types in this implementation
crates/vm/src/stdlib/ctypes/structure.rs (2)

347-370: Populate fields map from class metadata in all three constructors.

All three constructors (from_address, from_buffer, from_buffer_copy) create structures with an empty fields map (IndexMap::new()). If code relies on self.fields for field access, these instances won't work correctly.

Consider extracting the field-parsing logic from py_new (lines 243-294) into a helper method that all constructors can reuse to populate the fields map from the class's _fields_ attribute.

Also applies to: 372-421, 423-464


334-335: Add AsBuffer to the with() clause.

The AsBuffer trait is implemented for PyCStructure (lines 492-502), but it's not declared in the #[pyclass] attribute's with() clause. Without this, the buffer protocol may not be properly exposed to Python.

Apply this diff:

-#[pyclass(flags(BASETYPE, IMMUTABLETYPE), with(Constructor))]
+#[pyclass(flags(BASETYPE, IMMUTABLETYPE), with(Constructor, AsBuffer))]
 impl PyCStructure {
crates/vm/src/stdlib/ctypes/base.rs (2)

774-801: Consider adding alignment validation for from_address.

Reading from an arbitrary address without verifying alignment can cause undefined behavior on platforms requiring aligned access for certain types. While reading as u8 bytes mitigates some concerns, the address should ideally match the type's alignment requirements.

Consider adding a runtime check:

let type_str = type_attr.str(vm)?.to_string();
let size = get_size(&type_str);
let align = get_align(&type_str);
if (address as usize) % align != 0 {
    return Err(vm.new_value_error("address is not properly aligned".to_owned()));
}

803-853: Implement source buffer tracking in from_buffer.

The TODO at line 851 correctly identifies that the source buffer reference must be stored to prevent garbage collection. Without this, the underlying memory could be freed while the ctypes object still references it.

Apply this approach:

// Create instance
let instance = PyCSimple::py_new(cls.clone(), (OptionalArg::Present(value),), vm)?;

// Store reference to source buffer in _objects to keep it alive
if let Ok(simple_ref) = instance.clone().downcast::<PyCSimple>() {
    simple_ref.cdata.write().objects = Some(source);
}

Ok(instance)
crates/vm/src/stdlib/ctypes/thunk.rs (2)

105-113: Fix platform-specific size mismatch for c_long/c_ulong.

Lines 105-113 unconditionally write 64-bit values for both l/q and L/Q. On platforms where c_long/c_ulong are 32-bit (Windows 64-bit LLP64, 32-bit platforms), this writes 8 bytes to a 4-byte buffer, causing memory corruption.

Apply this fix:

-            Some("l") | Some("q") => {
+            Some("l") => {
+                if let Ok(i) = obj.try_int(vm) {
+                    if std::mem::size_of::<libc::c_long>() == 4 {
+                        *(result as *mut i32) = i.as_bigint().to_i32().unwrap_or(0);
+                    } else {
+                        *(result as *mut i64) = i.as_bigint().to_i64().unwrap_or(0);
+                    }
+                }
+            }
+            Some("q") => {
                 if let Ok(i) = obj.try_int(vm) {
                     *(result as *mut i64) = i.as_bigint().to_i64().unwrap_or(0);
                 }
             }
-            Some("L") | Some("Q") => {
+            Some("L") => {
+                if let Ok(i) = obj.try_int(vm) {
+                    if std::mem::size_of::<libc::c_ulong>() == 4 {
+                        *(result as *mut u32) = i.as_bigint().to_u32().unwrap_or(0);
+                    } else {
+                        *(result as *mut u64) = i.as_bigint().to_u64().unwrap_or(0);
+                    }
+                }
+            }
+            Some("Q") => {
                 if let Ok(i) = obj.try_int(vm) {
                     *(result as *mut u64) = i.as_bigint().to_u64().unwrap_or(0);
                 }
             }

306-311: Remove or fix unsafe Send+Sync impl — libffi Closure is !Send and !Sync.

The safety comment is incorrect. libffi-rs::middle::Closure explicitly implements !Send and !Sync (negative trait bounds) because proper thread-safety requires careful handling of the closure's userdata and lifetime. Simply being heap-allocated does not guarantee thread safety.

According to the libffi-rs documentation, the caller is responsible for ensuring thread-safety when closures may be called from other threads. The current blanket unsafe impl violates these constraints.

Either:

  1. Remove these unsafe impls entirely
  2. Wrap thunk_data in proper synchronization (e.g., Arc<Mutex<...>>)
  3. Provide a detailed, correct safety justification proving the Closure's userdata and lifetimes make cross-thread access safe

Update the safety comment to match the chosen approach.

🧹 Nitpick comments (1)
crates/vm/src/stdlib/ctypes/structure.rs (1)

140-156: Consider extending alignment calculation for complex types.

The current implementation only computes alignment for simple types (where alignment equals size). For pointer types, nested structures, and arrays, the default alignment of 1 may be incorrect and could lead to misaligned field access on some platforms.

Consider handling:

  • Pointer types: use std::mem::align_of::<*const ()>()
  • Nested structures: use their maximum field alignment (via stg_info)
  • Arrays: use element alignment
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b92d266 and 685690f.

📒 Files selected for processing (10)
  • .cspell.dict/cpython.txt (1 hunks)
  • crates/vm/src/stdlib/ctypes.rs (7 hunks)
  • crates/vm/src/stdlib/ctypes/array.rs (13 hunks)
  • crates/vm/src/stdlib/ctypes/base.rs (10 hunks)
  • crates/vm/src/stdlib/ctypes/function.rs (5 hunks)
  • crates/vm/src/stdlib/ctypes/pointer.rs (3 hunks)
  • crates/vm/src/stdlib/ctypes/structure.rs (9 hunks)
  • crates/vm/src/stdlib/ctypes/thunk.rs (1 hunks)
  • crates/vm/src/stdlib/ctypes/union.rs (3 hunks)
  • crates/vm/src/stdlib/ctypes/util.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/vm/src/stdlib/ctypes/util.rs
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style in Rust code (cargo fmt to format)
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by your changes
Follow Rust best practices for error handling and memory management

Files:

  • crates/vm/src/stdlib/ctypes/array.rs
  • crates/vm/src/stdlib/ctypes.rs
  • crates/vm/src/stdlib/ctypes/union.rs
  • crates/vm/src/stdlib/ctypes/thunk.rs
  • crates/vm/src/stdlib/ctypes/structure.rs
  • crates/vm/src/stdlib/ctypes/pointer.rs
  • crates/vm/src/stdlib/ctypes/base.rs
  • crates/vm/src/stdlib/ctypes/function.rs
**/src/**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/vm/src/stdlib/ctypes/array.rs
  • crates/vm/src/stdlib/ctypes.rs
  • crates/vm/src/stdlib/ctypes/union.rs
  • crates/vm/src/stdlib/ctypes/thunk.rs
  • crates/vm/src/stdlib/ctypes/structure.rs
  • crates/vm/src/stdlib/ctypes/pointer.rs
  • crates/vm/src/stdlib/ctypes/base.rs
  • crates/vm/src/stdlib/ctypes/function.rs
🧠 Learnings (5)
📚 Learning: 2025-11-10T06:27:41.954Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6243
File: vm/src/function/buffer.rs:123-129
Timestamp: 2025-11-10T06:27:41.954Z
Learning: In vm/src/function/buffer.rs, line 36 in the try_rw_bytes_like method intentionally uses TypeError (not BufferError) for the "buffer is not a read-write bytes-like object" error case, even though a similar error message in ArgMemoryBuffer::try_from_borrowed_object uses BufferError. The TypeError is the intended exception type for this specific code path.

Applied to files:

  • crates/vm/src/stdlib/ctypes/array.rs
  • crates/vm/src/stdlib/ctypes/union.rs
  • crates/vm/src/stdlib/ctypes/thunk.rs
  • crates/vm/src/stdlib/ctypes/structure.rs
  • crates/vm/src/stdlib/ctypes/base.rs
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In the RustPython codebase, only certain builtin types should be marked with the SEQUENCE flag for pattern matching. List and tuple are sequences, but bytes, bytearray, and range are not considered sequences in this context, even though they may implement sequence-like protocols.

Applied to files:

  • crates/vm/src/stdlib/ctypes/array.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.

Applied to files:

  • crates/vm/src/stdlib/ctypes.rs
  • crates/vm/src/stdlib/ctypes/base.rs
  • crates/vm/src/stdlib/ctypes/function.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.

Applied to files:

  • crates/vm/src/stdlib/ctypes.rs
  • crates/vm/src/stdlib/ctypes/base.rs
  • crates/vm/src/stdlib/ctypes/function.rs
📚 Learning: 2025-11-25T11:05:02.899Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-25T11:05:02.899Z
Learning: Applies to **/src/**/*.rs : Use the macro system (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust

Applied to files:

  • crates/vm/src/stdlib/ctypes.rs
  • crates/vm/src/stdlib/ctypes/function.rs
🧬 Code graph analysis (5)
crates/vm/src/stdlib/ctypes/array.rs (3)
crates/vm/src/stdlib/ctypes.rs (4)
  • std (417-417)
  • std (448-448)
  • std (696-696)
  • size_of (391-453)
crates/vm/src/stdlib/ctypes/base.rs (11)
  • std (542-542)
  • std (550-550)
  • std (755-755)
  • std (758-758)
  • std (761-761)
  • new (257-263)
  • from_bytes (186-193)
  • buffer (1016-1016)
  • buffer (1026-1026)
  • instance (965-965)
  • size (213-215)
crates/vm/src/stdlib/ctypes/util.rs (1)
  • new (31-40)
crates/vm/src/stdlib/ctypes.rs (5)
crates/vm/src/stdlib/ctypes/base.rs (9)
  • std (542-542)
  • std (550-550)
  • std (755-755)
  • std (758-758)
  • std (761-761)
  • size (213-215)
  • new (257-263)
  • t (22-22)
  • get_stg_info (242-254)
crates/vm/src/stdlib/ctypes/pointer.rs (3)
  • std (30-30)
  • std (150-150)
  • std (191-191)
crates/vm/src/stdlib/ctypes/structure.rs (3)
  • std (137-137)
  • field (76-77)
  • field (262-262)
crates/vm/src/stdlib/ctypes/field.rs (6)
  • size (259-261)
  • new (36-53)
  • obj (107-107)
  • obj (116-116)
  • obj (201-201)
  • obj (218-218)
crates/vm/src/stdlib/ctypes/util.rs (1)
  • new (31-40)
crates/vm/src/stdlib/ctypes/union.rs (2)
crates/vm/src/stdlib/ctypes.rs (5)
  • get_size (142-161)
  • std (417-417)
  • std (448-448)
  • std (696-696)
  • size_of (391-453)
crates/vm/src/stdlib/ctypes/util.rs (2)
  • default (18-27)
  • new (31-40)
crates/vm/src/stdlib/ctypes/thunk.rs (1)
crates/vm/src/stdlib/ctypes/base.rs (8)
  • std (542-542)
  • std (550-550)
  • std (755-755)
  • std (758-758)
  • std (761-761)
  • ffi_type_from_str (25-48)
  • get_type_code (18-23)
  • t (22-22)
crates/vm/src/stdlib/ctypes/pointer.rs (2)
crates/vm/src/stdlib/ctypes/array.rs (1)
  • __mul__ (96-121)
crates/vm/src/stdlib/ctypes/util.rs (1)
  • new (31-40)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Run tests under miri
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
🔇 Additional comments (22)
.cspell.dict/cpython.txt (1)

53-53: LGTM!

The dictionary entry is correctly positioned alphabetically and aligns with the swapped-endianness handling introduced in this PR.

crates/vm/src/stdlib/ctypes/function.rs (2)

274-315: LGTM - Callback creation for Python callables.

The implementation correctly:

  • Creates a PyCThunk wrapping the Python callable
  • Extracts and stores the code pointer
  • Parses _argtypes_ with proper None handling
  • Manages thunk lifetime via PyRef<PyCThunk>

344-346: Good defensive change using ok() instead of unwrap().

Using ok() for the downcast prevents panics if the return type isn't a PyType, gracefully falling back to Type::i32.

crates/vm/src/stdlib/ctypes/array.rs (3)

588-597: LGTM - Buffer reference tracking now implemented.

The from_buffer method now correctly passes Some(buffer.obj.clone()) to CDataObject::from_bytes, preserving the source buffer reference to prevent it from being garbage collected.


758-792: LGTM - AsBuffer implementation.

The buffer methods correctly use mapped lock guards to safely expose the underlying buffer, and readonly=false is appropriate for ctypes mutable arrays.


502-529: Verify caller ensures valid memory at address.

The from_address method reads from a raw memory address. While the NULL and zero-size checks are good, the caller must ensure the memory at address is valid and properly sized.

Consider adding a safety note in documentation that the caller is responsible for ensuring the address points to valid, accessible memory of at least size bytes.

crates/vm/src/stdlib/ctypes/union.rs (1)

299-328: LGTM - AsBuffer implementation.

The buffer methods implementation is consistent with PyCArray and correctly exposes the union's memory buffer.

crates/vm/src/stdlib/ctypes.rs (2)

184-347: LGTM - bytes_to_pyobject uses correct platform-aware sizes.

The implementation correctly uses mem::size_of for each C type, ensuring platform-appropriate byte reads for c_long, c_int, etc. This addresses the platform-dependent size concern from earlier reviews.


660-755: LGTM - Comprehensive alignment handling.

The alignment function correctly handles multiple cases including instances, type objects, arrays, pointers, and structures/unions by computing maximum field alignment. The delegation pattern for instances to their class types is sound.

crates/vm/src/stdlib/ctypes/pointer.rs (3)

18-18: LGTM! Storage metadata integration looks correct.

The new stg_info field and its use in __mul__ properly align with the CDataObject-backed storage model. The size calculation and PyCArrayType construction follow the pattern established in array.rs.

Also applies to: 32-38


75-120: LGTM! Constructor integration is well-implemented.

The Constructor trait implementation and __init__ method correctly handle optional initialization values, aligning with the pattern used in other ctypes types.


122-133: LGTM! NULL check and address handling are appropriate.

The implementation correctly validates the address and creates a pointer instance with the address stored as contents.

crates/vm/src/stdlib/ctypes/structure.rs (3)

1-3: LGTM! Storage model integration is consistent.

The new imports and stg_info field properly integrate PyCStructType with the CDataObject-backed storage model used across ctypes types.

Also applies to: 20-23


158-182: LGTM! Array type creation correctly uses size_of.

The __mul__ implementation properly computes element size via size_of and constructs PyCArrayType with the appropriate stg_info, consistent with the new storage model.


224-229: LGTM! CDataObject-backed storage is well-integrated.

The migration to cdata: PyRwLock<CDataObject> and the updated constructor with alignment-aware initialization correctly implement the new storage model. The StgInfo usage aligns with the pattern across other ctypes types.

Also applies to: 248-302

crates/vm/src/stdlib/ctypes/base.rs (5)

17-23: LGTM! Core CDataObject abstractions are well-designed.

The new get_type_code helper, CDataObject backing structure with its constructors, and the PyCData wrapper provide a solid foundation for the unified storage model across ctypes types.

Also applies to: 159-254


266-383: LGTM! Comprehensive from_param implementation.

The expanded from_param implementation correctly handles type-specific conversions, None for pointer types, and falls back to _as_parameter_ attribute. The logic aligns well with CPython's ctypes behavior.


434-608: LGTM! Platform-aware sizing for c_long/c_ulong is now correct.

The value_to_bytes_endian function now properly handles platform-dependent types using libc::c_long/libc::c_ulong and std::mem::size_of, addressing the previous issue where 4-byte sizes were hardcoded. The endianness support via the swapped parameter is also well-implemented.


646-661: LGTM! Endianness-aware initialization and value management.

The integration of swapped endianness checks in py_new and set_value, along with the unsigned conversion logic in the value getter, correctly implements the required semantics for simple ctypes with proper buffer synchronization.

Also applies to: 676-739


898-970: LGTM! Excellent dll reference tracking.

The in_dll implementation correctly stores a reference to the dll object in cdata.base (lines 965-967), ensuring the library remains loaded as long as the ctypes object exists. This prevents use-after-free issues.

crates/vm/src/stdlib/ctypes/thunk.rs (2)

1-57: LGTM! FFI helper utilities are well-structured.

The ThunkUserData, get_type_code, and ffi_to_python functions provide clean abstractions for the thunk implementation. The use of libc::c_long and libc::c_ulong (lines 47-50) ensures platform-correct type handling.


138-304: LGTM! Thunk lifecycle management is well-designed.

The thunk_callback, ThunkData with its Drop implementation, and PyCThunk::new() together provide proper lifetime management for the libffi closure. Leaking userdata for 'static lifetime and reclaiming it in Drop is the correct pattern for this use case.

Comment on lines +825 to +872
#[pyfunction(name = "_cast")]
pub fn pycfunction_cast(
obj: PyObjectRef,
_obj2: PyObjectRef,
ctype: PyObjectRef,
vm: &VirtualMachine,
) -> PyResult {
use super::array::PyCArray;
use super::base::PyCData;
use super::pointer::PyCPointer;
use crate::class::StaticType;

// Python signature: _cast(obj, obj, ctype)
// Python passes the same object twice (obj and _obj2 are the same)
// We ignore _obj2 as it's redundant

// Check if this is a pointer type (has _type_ attribute)
if ctype.get_attr("_type_", vm).is_err() {
return Err(vm.new_type_error("cast() argument 2 must be a pointer type".to_string()));
}

// Create an instance of the target pointer type with no arguments
let result = ctype.call((), vm)?;

// Get the pointer value from the source object
// If obj is a CData instance (including arrays), use the object itself
// If obj is an integer, use it directly as the pointer value
let ptr_value: PyObjectRef = if obj.fast_isinstance(PyCData::static_type())
|| obj.fast_isinstance(PyCArray::static_type())
|| obj.fast_isinstance(PyCPointer::static_type())
{
// For CData objects (including arrays and pointers), store the object itself
obj.clone()
} else if let Ok(int_val) = obj.try_int(vm) {
// For integers, treat as pointer address
vm.ctx.new_int(int_val.as_bigint().clone()).into()
} else {
return Err(vm.new_type_error(format!(
"cast() argument 1 must be a ctypes instance or an integer, not {}",
obj.class().name()
)));
};

// Set the contents of the pointer by setting the attribute
result.set_attr("contents", ptr_value, vm)?;

Ok(result)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

CPython ctypes cast function behavior memory address pointer

💡 Result:

Short summary of CPython ctypes.cast behavior:

  • ctypes.cast(obj, new_type) returns a new pointer instance of new_type that views the same memory as obj (it does not copy memory). new_type must be a pointer type. [1][2]
  • obj may be a ctypes instance, a pointer, an integer address, or some buffer-like Python objects; cast will convert the first arg to c_void_p under the hood for the FFI call. [2][3]
  • cast does NOT extend the lifetime of the underlying buffer — you must keep a reference to the original object (or otherwise ensure the memory stays valid) or the pointer becomes dangling. [3]
  • pointer.contents returns a freshly-constructed ctypes object each access (ctypes has no OOR — "original object return"), so identity comparisons with is will be False; use ctypes.addressof(...) to compare underlying addresses. [1][7]
  • Use ctypes.addressof(obj) to get the integer address of a ctypes instance; use byref(obj, offset) to get a lightweight pointer for calls. [1][2]

Concise example:

  • ptr = ctypes.cast(ctypes.pointer(my_struct), ctypes.POINTER(OtherStruct)) — ptr points at the same memory, interpreted as OtherStruct; changing ptr.contents updates underlying memory; but keep my_struct alive. [1][3]

Sources: Python ctypes docs and practical notes. [1] Python ctypes Pointers (docs). [2] Python ctypes Utility functions (addressof, byref, cast). [3] Discussion/examples on using cast with bytes/addresses.


🏁 Script executed:

#!/bin/bash
# Find pointer-related files in ctypes
fd -t f "pointer" crates/vm/src/stdlib/ | head -20

Repository: RustPython/RustPython

Length of output: 186


🏁 Script executed:

#!/bin/bash
# Look for contents attribute handling
rg -n "contents" crates/vm/src/stdlib/ctypes/ -B2 -A2 --type rust

Repository: RustPython/RustPython

Length of output: 186


🏁 Script executed:

#!/bin/bash
# Search for cast function tests
rg -n "test.*cast|cast.*test" crates/vm/src/stdlib/ --type rust -i

Repository: RustPython/RustPython

Length of output: 186


🌐 Web query:

RustPython ctypes cast pointer contents behavior

💡 Result:

Short summary of ctypes cast / pointer / contents behavior you’ll see in CPython (and what RustPython aims to match):

  • pointer.contents returns a new ctypes instance that represents the object pointed-to — it is not the original Python object (pi.contents is i is False). Reading .contents constructs a fresh wrapper each access. [1]
  • ctypes.cast(obj, POINTER(T)) returns a pointer-wrapper that references the same underlying memory block as obj (no copy of the raw bytes). Using that pointer to read/write (via indexing or .contents) operates on the same memory. [1]
  • For “simple” ctypes (immediate subclasses of _SimpleCData such as c_void_p, c_char_p, c_int, etc.) ctypes does automatic conversion/autocast to native Python types in many contexts (returned from functions or read from struct/array fields). That behavior can make .contents or field access return Python ints/bytes rather than ctypes objects and has caused surprises/bugs. A Python tracker discussion explains the autocast and workarounds (subclassing) if you need a ctypes object instead of the native type. [2]
  • Practical gotchas seen in real code: repeated cast/create operations can hold references or create wrapper objects that increase memory usage (people have observed apparent leaks when repeatedly casting large buffers). The cast itself references the buffer but constructing many wrapper objects can grow memory use. [3]

If you need guidance for a specific RustPython mismatch or a reproducer (example code showing different behavior vs CPython), share the snippet and I’ll look up/compare exact behavior and point to whether it’s a RustPython bug or a libc/ctypes semantics issue.

Sources:
[1] Python ctypes docs (pointers, cast, contents).
[2] Python tracker discussion on simple-types autocast (issue discussion / workaround).
[3] StackOverflow thread: ctypes.cast memory/leak observations.


🌐 Web query:

RustPython ctypes implementation pointer contents

💡 Result:

Short answer: RustPython's ctypes support is incomplete — pointer.contents semantics from CPython (where .contents returns a new ctypes instance referencing the pointed memory) are not fully implemented/guaranteed in RustPython yet. See the RustPython repo/issues noting "Implement the ctypes module" and ongoing work; for CPython behavior refer to the official ctypes docs (pointer() / POINTER() / .contents). [1][2]

References

  • RustPython repo / roadmap mentioning ctypes implementation work. [1]
  • CPython ctypes documentation (pointer and .contents behavior). [2]

Based on my verification through web search, I can now provide a rewritten review comment. The investigation confirms that CPython's cast() and pointer.contents behavior differs from the current RustPython implementation, though RustPython's ctypes module is still incomplete.

Behavior differs from CPython: cast() should return a pointer viewing the same memory address, and contents should construct a fresh wrapper on each access.

Per CPython semantics, cast(obj, POINTER(T)) returns a pointer that views the same underlying memory as obj (not a copy), and accessing pointer.contents constructs a fresh ctypes wrapper each time—not the original object. The current implementation stores the source object reference directly in contents, which is semantically different from CPython's behavior.

While RustPython's ctypes module is incomplete and this approach may prevent dangling pointers (keeping the source object alive via the stored reference), it diverges from CPython's expected semantics. If RustPython intends to match CPython's ctypes behavior, contents should dynamically construct a fresh wrapper rather than return a stored reference.

🤖 Prompt for AI Agents
In crates/vm/src/stdlib/ctypes.rs around lines 825-872, the current
pycfunction_cast stores the source object itself into the pointer's "contents"
attribute which keeps the original object alive and returns that same wrapper,
but CPython semantics require cast() to create a pointer that views the same
memory address and pointer.contents to construct a fresh wrapper each access.
Change the implementation so the created pointer object stores the raw pointer
address (derived from the source CData/CPointer or taken from an int) instead of
the source object reference, and make the pointer's contents access generate and
return a new wrapper object for the pointee type on every call (construct a
fresh instance bound to the stored address), leaving no persistent reference to
the original wrapper.

Comment on lines +215 to +268
#[pyclassmethod]
fn in_dll(
cls: PyTypeRef,
dll: PyObjectRef,
name: crate::builtins::PyStrRef,
vm: &VirtualMachine,
) -> PyResult {
use libloading::Symbol;

// Get the library handle from dll object
let handle = if let Ok(int_handle) = dll.try_int(vm) {
// dll is an integer handle
int_handle
.as_bigint()
.to_usize()
.ok_or_else(|| vm.new_value_error("Invalid library handle".to_owned()))?
} else {
// dll is a CDLL/PyDLL/WinDLL object with _handle attribute
dll.get_attr("_handle", vm)?
.try_int(vm)?
.as_bigint()
.to_usize()
.ok_or_else(|| vm.new_value_error("Invalid library handle".to_owned()))?
};

// Get the library from cache
let library_cache = crate::stdlib::ctypes::library::libcache().read();
let library = library_cache
.get_lib(handle)
.ok_or_else(|| vm.new_attribute_error("Library not found".to_owned()))?;

// Get symbol address from library
let symbol_name = format!("{}\0", name.as_str());
let inner_lib = library.lib.lock();

let symbol_address = if let Some(lib) = &*inner_lib {
unsafe {
// Try to get the symbol from the library
let symbol: Symbol<'_, *mut u8> = lib.get(symbol_name.as_bytes()).map_err(|e| {
vm.new_attribute_error(format!("{}: symbol '{}' not found", e, name.as_str()))
})?;
*symbol as usize
}
} else {
return Err(vm.new_attribute_error("Library is closed".to_owned()));
};

// For pointer types, we return a pointer to the symbol address
Ok(PyCPointer {
contents: PyRwLock::new(vm.ctx.new_int(symbol_address).into()),
}
.into_ref_with_type(vm, cls)?
.into())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Store reference to dll object to prevent library unload.

The implementation retrieves a symbol address from the library but doesn't store a reference to the dll object. If the library is closed or garbage collected, the symbol address becomes invalid, potentially causing segmentation faults.

Consider adding a mechanism to track the dll reference, similar to how PyCSimple::in_dll stores dll in cdata.base (lines 965-967 in base.rs). Since PyCPointer doesn't currently have a cdata field, you may need to add an optional reference holder.

Comment on lines +59 to +64
/// Convert a Python object to a C value and store it at the result pointer
fn python_to_ffi(obj: PyResult, ty: &PyTypeRef, result: *mut c_void, vm: &VirtualMachine) {
let obj = match obj {
Ok(o) => o,
Err(_) => return, // Exception occurred, leave result as-is
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Zero-initialize result buffer on exception.

When the Python callable raises an exception, the function returns without writing to result, leaving uninitialized garbage for the C caller. While RustPython may have exception propagation semantics that handle this, consider zero-initializing the result for defensive programming:

let obj = match obj {
    Ok(o) => o,
    Err(_) => {
        // Zero-initialize result on error
        unsafe { std::ptr::write_bytes(result, 0, 1); }
        return;
    }
};
🤖 Prompt for AI Agents
In crates/vm/src/stdlib/ctypes/thunk.rs around lines 59-64, when the Python call
returns Err the function currently returns without writing to the result pointer
leaving uninitialized memory; change the Err branch to zero-initialize the
result buffer before returning: check that result is non-null, compute the byte
size of the target C type for this PyTypeRef (the size used for FFI storage) and
call unsafe std::ptr::write_bytes(result, 0, size) to clear that many bytes,
then return; ensure the write is performed inside an unsafe block and uses the
correct size for the FFI type rather than a fixed 1 byte.

@youknowone youknowone merged commit 6782fa2 into RustPython:main Nov 29, 2025
12 of 13 checks passed
@youknowone youknowone deleted the ctypes-buffer branch November 29, 2025 12:17
This was referenced Dec 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant