Skip to content

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Dec 7, 2025

Summary by CodeRabbit

  • New Features

    • Added umask() support on Windows for setting process file mode creation mask.
  • Improvements

    • Major overhaul of Windows file metadata/stat handling for more accurate file types, symlink/reparse handling, and fewer edge-case errors.
    • Updated system configuration defaults to include an EXT_SUFFIX of ".pyd" and expose SOABI.
  • Chores

    • Added SOABI token to the spelling/dictionary list.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 7, 2025

Walkthrough

Adds Windows-specific stat/xstat implementation and helpers, exposes a winsock initializer, adds an FFI-backed umask wrapper in the nt stdlib module, extends sysconfig config vars (EXT_SUFFIX, SOABI, Py_DEBUG), and inserts the token SOABI into the CPython dictionary file.

Changes

Cohort / File(s) Summary
Dictionary Updates
.cspell.dict/cpython.txt
Added token SOABI to the CPython word list
Stdlib: nt & sysconfig
crates/vm/src/stdlib/nt.rs, crates/vm/src/stdlib/sysconfig.rs
Added extern C _umask declaration and a #[pyfunction] umask(mask: i32) -> i32 wrapper; extended config_vars with EXT_SUFFIX (".pyd"), SOABI (None), and Py_DEBUG (false) entries
Windows file metadata & stat/xstat
crates/vm/src/windows.rs
Introduced comprehensive Windows stat/xstat slow-path implementation, new file-attribute and mode constants, reparse-point/symlink handling, FILETIME/time conversions, inode/file-id logic, attributes_from_dir, attribute_data_to_stat, and pub fn init_winsock()

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

  • Focus review on Windows API usage, reparse-point and symlink handling, and FILETIME-to-epoch conversions in crates/vm/src/windows.rs.
  • Verify safety and correct linkage for the new extern _umask and the umask wrapper in crates/vm/src/stdlib/nt.rs.
  • Confirm EXT_SUFFIX hardcoding and other config choices in crates/vm/src/stdlib/sysconfig.rs align with intended packaging behavior.

Possibly related PRs

Poem

🐰 A tiny hop through files and time,
I add a mask and parse a rhyme,
SOABI tucked into the list,
Windows stats now gently kissed,
Hooray—this rabbit leaves a chime! 🎋

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly references three key changes in the changeset: windows umask (new FFI binding in nt.rs), win32_xstat_slow_impl (new Windows stat implementation in windows.rs), and fake EXT_SUFFIX (sysconfig.rs initialization).
Docstring Coverage ✅ Passed Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@youknowone youknowone force-pushed the nt branch 2 times, most recently from 8dcbecb to 0c70492 Compare December 7, 2025 16:56
@youknowone youknowone marked this pull request as ready for review December 7, 2025 16:56
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
crates/vm/src/windows.rs (2)

400-408: Type mismatch: using local struct with windows_sys enum constant.

Line 405 uses FileAttributeTagInfo which is imported as a FILE_INFO_BY_HANDLE_CLASS enum variant from windows_sys (line 283), but you're using it as both the enum value (line 405) and casting to a local struct type (line 401, 406). This works because the enum value happens to match the expected constant, but the struct being zeroed is the local FileAttributeTagInfo, not the windows_sys structure.

This is fragile and confusing. Consider:

+// Use explicit constant for clarity
+const FILE_ATTRIBUTE_TAG_INFO_CLASS: i32 = 9; // FileAttributeTagInfo enum value

 let ret = unsafe {
     GetFileInformationByHandleEx(
         h_file,
-        FileAttributeTagInfo,
+        FILE_ATTRIBUTE_TAG_INFO_CLASS,
         &mut local_tag_info as *mut _ as *mut _,
         std::mem::size_of::<FileAttributeTagInfo>() as u32,
     )
 };

Or rename the local struct to avoid the collision entirely.


370-371: Consider RAII wrapper for handle cleanup.

The closure pattern for ensuring handle cleanup works, but it's somewhat unusual in Rust. A more idiomatic approach would be to use an RAII wrapper that automatically closes the handle on drop:

struct OwnedHandle(RAW_HANDLE);
impl Drop for OwnedHandle {
    fn drop(&mut self) {
        if self.0 != INVALID_HANDLE_VALUE {
            unsafe { CloseHandle(self.0) };
        }
    }
}

This is optional since the current approach is correct; it's just a style suggestion for future maintenance.

Also applies to: 493-500

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cab41c8 and 0c70492.

⛔ Files ignored due to path filters (4)
  • Lib/test/test_genericpath.py is excluded by !Lib/**
  • Lib/test/test_logging.py is excluded by !Lib/**
  • Lib/test/test_ntpath.py is excluded by !Lib/**
  • Lib/test/test_shutil.py is excluded by !Lib/**
📒 Files selected for processing (4)
  • .cspell.dict/cpython.txt (1 hunks)
  • crates/vm/src/stdlib/nt.rs (1 hunks)
  • crates/vm/src/stdlib/sysconfig.rs (1 hunks)
  • crates/vm/src/windows.rs (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/vm/src/stdlib/nt.rs
  • crates/vm/src/windows.rs
  • crates/vm/src/stdlib/sysconfig.rs
🧠 Learnings (4)
📓 Common learnings
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.
📚 Learning: 2025-11-29T12:17:28.606Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-29T12:17:28.606Z
Learning: Applies to **/*.rs : Use the macro system (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust

Applied to files:

  • crates/vm/src/stdlib/nt.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.

Applied to files:

  • crates/vm/src/stdlib/nt.rs
  • crates/vm/src/stdlib/sysconfig.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.

Applied to files:

  • crates/vm/src/stdlib/nt.rs
  • crates/vm/src/stdlib/sysconfig.rs
🧬 Code graph analysis (2)
crates/vm/src/stdlib/nt.rs (1)
crates/vm/src/stdlib/posix.rs (1)
  • umask (1328-1330)
crates/vm/src/windows.rs (1)
crates/vm/src/stdlib/winapi.rs (2)
  • CloseHandle (83-85)
  • GetFileType (163-173)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Check the WASM package and demo
🔇 Additional comments (5)
.cspell.dict/cpython.txt (1)

49-49: LGTM!

The SOABI token is correctly added to the dictionary in alphabetical order. This aligns with the new SOABI config variable introduced in sysconfig.rs.

crates/vm/src/windows.rs (4)

113-130: LGTM!

The attributes_to_mode function correctly ports CPython's logic for converting Windows file attributes to Unix-style mode bits.


132-227: LGTM!

The attribute_data_to_stat function is a comprehensive port of CPython's _Py_attribute_data_to_stat. The time conversion logic correctly handles the Windows-to-Unix epoch difference, and the 128-bit file ID support via st_ino_high is properly implemented.


229-267: LGTM!

The attributes_from_dir fallback correctly uses FindFirstFileW when CreateFileW fails due to access/sharing issues. The handle cleanup is properly handled.


534-535: LGTM!

The fallback to win32_xstat_slow_impl is correctly integrated, providing robust handling when the fast path (GetFileInformationByName) cannot be used.

Comment on lines +10 to +21

// FIXME: This is an entirely wrong implementation of EXT_SUFFIX.
// EXT_SUFFIX must be a string starting with "." for pip compatibility
// Using ".pyd" causes pip's _generic_abi() to fall back to _cpython_abis()
vars.set_item("EXT_SUFFIX", ".pyd".to_pyobject(vm), vm)
.unwrap();
vars.set_item("SOABI", vm.ctx.none(), vm).unwrap();

vars.set_item("Py_GIL_DISABLED", true.to_pyobject(vm), vm)
.unwrap();
vars.set_item("Py_DEBUG", false.to_pyobject(vm), vm)
.unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the sysconfig.rs file to see its full context
wc -l crates/vm/src/stdlib/sysconfig.rs

Repository: RustPython/RustPython

Length of output: 186


🏁 Script executed:

# Check the complete sysconfig.rs file and its module definition
cat -n crates/vm/src/stdlib/sysconfig.rs

Repository: RustPython/RustPython

Length of output: 186


🏁 Script executed:

# Search for how sysconfig is registered/loaded and any platform guards
rg -n "sysconfig" crates/vm/src/stdlib/ --type rust -B 3 -A 3

Repository: RustPython/RustPython

Length of output: 186


🏁 Script executed:

# Check for platform-specific gates on sysconfig module
rg -n "#\[cfg" crates/vm/src/stdlib/sysconfig.rs

Repository: RustPython/RustPython

Length of output: 186


🏁 Script executed:

# Search for EXT_SUFFIX usage across the codebase
rg -n "EXT_SUFFIX" crates/vm/src/ --type rust -B 2 -A 2

Repository: RustPython/RustPython

Length of output: 186


🌐 Web query:

CPython sysconfig EXT_SUFFIX platform specific Windows Unix macOS

💡 Result:

Short summary — what EXT_SUFFIX is and platform differences, and how to get it:

  • EXT_SUFFIX is the filename suffix used for compiled Python extension modules (shared objects) for the current build — e.g. on POSIX you usually get .cpython-XY... .so (includes the SOABI), on Windows you get .pyd (often with a platform/ABI tag like .cp37-win_amd64.pyd when full suffix is available). [1][2]

  • How CPython exposes it:

    • sysconfig.get_config_var('EXT_SUFFIX') — canonical API to query the extension suffix. On many Unix/macOS builds this returns the full tagged suffix (e.g. ".cpython-39m-darwin.so"). On some Windows builds it may return only ".pyd" (SOABI may be None). [2][5]
    • If you need the fuller Windows tag (e.g. ".cp37-win_amd64.pyd"), distutils.sysconfig.get_config_var('EXT_SUFFIX') historically returned the full value on Windows in some builds (useful as a fallback). Note: distutils is deprecated but still present in many Python versions; prefer sysconfig when possible. [0]
  • Practical snippet:

    • import sysconfig; ext = sysconfig.get_config_var('EXT_SUFFIX')
    • fallback (Windows): import distutils.sysconfig; ext = distutils.sysconfig.get_config_var('EXT_SUFFIX') if ext is None or ext == '.pyd' else ext. [0][1]

References:

  • Python sysconfig docs (describe get_config_var / get_platform / usage). [1][2]
  • StackOverflow discussion showing Windows behavior and distutils fallback. [0]

Sources:
[0] StackOverflow — "Python generate SOABI/Extension Suffix (pep 3149) for Windows".
[1] Python docs — sysconfig (library documentation).
[2] Python docs — Configure/using Python (notes on platform/config).


🌐 Web query:

RustPython sysconfig module EXT_SUFFIX implementation

💡 Result:

Do you want (a) the RustPython source location that implements sysconfig/EXT_SUFFIX, (b) an explanation of how EXT_SUFFIX should behave (CPython behaviour), or (c) help implementing/fixing EXT_SUFFIX in RustPython?


Hardcoded .pyd EXT_SUFFIX will break on Unix/macOS platforms.

EXT_SUFFIX must be platform-specific: Windows uses .pyd, while Unix/macOS require .cpython-XY-{platform}.so with SOABI included. The current hardcoded .pyd value will cause extension module discovery to fail on non-Windows platforms.

Ensure either:

  1. This module is Windows-only (gated with #[cfg(windows)]), or
  2. EXT_SUFFIX is set conditionally based on the target platform, with appropriate SOABI values for Unix/macOS
🤖 Prompt for AI Agents
crates/vm/src/stdlib/sysconfig.rs lines 10-21: the code hardcodes EXT_SUFFIX to
".pyd" which is only valid on Windows and misses SOABI for Unix/macOS; change
this by gating the Windows-specific assignment behind #[cfg(windows)] (or
runtime cfg!(windows)) and for non-Windows targets set EXT_SUFFIX to a
dot-prefixed SO filename (e.g. the cpython-style suffix that includes
ABI/version and platform) and populate SOABI with the corresponding value;
ensure EXT_SUFFIX always begins with '.' for pip compatibility and
unwrap/error-handle setters as before.

Comment on lines +105 to +111
/// FILE_ATTRIBUTE_TAG_INFO structure for GetFileInformationByHandleEx
#[repr(C)]
#[derive(Default)]
struct FileAttributeTagInfo {
file_attributes: u32,
reparse_tag: u32,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Name collision with windows_sys::FileAttributeTagInfo.

Line 283 imports FileAttributeTagInfo from windows_sys::Win32::Storage::FileSystem, but you've also defined a local struct with the same name here. This creates ambiguity and the local struct shadows the imported one.

Consider renaming the local struct to avoid confusion:

-struct FileAttributeTagInfo {
+struct LocalFileAttributeTagInfo {
     file_attributes: u32,
     reparse_tag: u32,
 }

Or remove the local definition and use the windows_sys type directly if the layout matches.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In crates/vm/src/windows.rs around lines 105 to 111, the locally defined
FileAttributeTagInfo struct conflicts with the imported
windows_sys::Win32::Storage::FileSystem::FileAttributeTagInfo, causing a name
collision and shadowing; either rename the local struct (e.g.,
LocalFileAttributeTagInfo or FileAttributeTagInfoLocal) and update all local
references to that new name, or remove the local definition and replace its
usages with the windows_sys type directly if the layout matches; ensure imports
are adjusted and any conversions/mapping between types are added if you keep a
separate local type.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
crates/vm/src/windows.rs (1)

105-111: FileAttributeTagInfo naming is slightly confusing but not incorrect.

Defining a local struct FileAttributeTagInfo while also importing the FileAttributeTagInfo enum value from windows_sys is valid (type vs value namespace), but it’s easy to misread. Consider a small rename of the struct (e.g. LocalFileAttributeTagInfo) to make the distinction clearer for future readers.

Also applies to: 400-408

🧹 Nitpick comments (2)
crates/vm/src/stdlib/sysconfig.rs (1)

11-21: EXT_SUFFIX stub is knowingly wrong; please follow up with a platform-aware implementation.

The .pyd value and SOABI = None make _sysconfig diverge from CPython on all non‑Windows platforms (and even on Windows, it’s missing ABI tagging). Since the FIXME already calls this out and the PR title mentions “fake EXT_SUFFIX”, it’s fine as a temporary stub, but this will need a follow‑up that:

  • Computes EXT_SUFFIX conditionally per target (Windows vs Unix), and
  • Keeps EXT_SUFFIX and SOABI consistent (e.g. EXT_SUFFIX derived from SOABI).

Classifying as a reminder rather than a blocker for this PR.

crates/vm/src/windows.rs (1)

231-267: attributes_from_dir fallback is correct but could benefit from a brief comment on limitations.

The FindFirstFileW-based fallback populates a minimal BY_HANDLE_FILE_INFORMATION and reparse tag when CreateFileW fails for access/sharing reasons, which matches CPython’s approach.

A tiny nit: a short comment noting that volume serial / file index aren’t available via this path (and will keep their zeroed defaults) would clarify why inode/dev fields may differ vs the direct handle-based path.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0c70492 and 8317e6d.

⛔ Files ignored due to path filters (4)
  • Lib/test/test_genericpath.py is excluded by !Lib/**
  • Lib/test/test_logging.py is excluded by !Lib/**
  • Lib/test/test_ntpath.py is excluded by !Lib/**
  • Lib/test/test_shutil.py is excluded by !Lib/**
📒 Files selected for processing (4)
  • .cspell.dict/cpython.txt (1 hunks)
  • crates/vm/src/stdlib/nt.rs (1 hunks)
  • crates/vm/src/stdlib/sysconfig.rs (1 hunks)
  • crates/vm/src/windows.rs (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • crates/vm/src/stdlib/nt.rs
  • .cspell.dict/cpython.txt
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/vm/src/stdlib/sysconfig.rs
  • crates/vm/src/windows.rs
🧠 Learnings (4)
📓 Common learnings
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.

Applied to files:

  • crates/vm/src/stdlib/sysconfig.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.

Applied to files:

  • crates/vm/src/stdlib/sysconfig.rs
📚 Learning: 2025-06-28T16:31:03.991Z
Learnt from: arihant2math
Repo: RustPython/RustPython PR: 5790
File: build.rs:2-2
Timestamp: 2025-06-28T16:31:03.991Z
Learning: In Cargo build scripts (build.rs), the environment variable CARGO_CFG_TARGET_OS is guaranteed to exist and is automatically set by Cargo during the build process, making unwrap() safe to use when accessing this variable.

Applied to files:

  • crates/vm/src/stdlib/sysconfig.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Run rust tests (ubuntu-latest)
🔇 Additional comments (2)
crates/vm/src/windows.rs (2)

96-130: Mode mapping from Windows attributes looks correct and matches CPython semantics.

The new constants and attributes_to_mode() logic (directory vs regular file, read‑only vs writable, execute bits for directories) align with CPython’s attributes_to_mode behavior. This is a solid foundation for the rest of the stat translation.


270-501: win32_xstat_slow_impl and fallback wiring in win32_xstat_impl look faithful to CPython.

  • Error handling around CreateFileW mirrors CPython’s special cases (access denied/sharing violation → directory fallback, invalid parameter → retry with GENERIC_READ, ERROR_CANT_ACCESS_FILE with traverse → unhandled reparse handling).
  • Non‑disk handles (char devices, pipes, unknown type) are mapped to appropriate st_mode bits and short‑circuit with a defaulted StatStruct.
  • Reparse tag probing with GetFileInformationByHandleEx(FileAttributeTagInfo, ...) plus the is_unhandled_tag logic and recursive traversal for non‑link reparse points matches the CPython control flow.
  • Optional FILE_BASIC_INFO and FILE_ID_INFO are used when available; failures downgrade gracefully.
  • Handle lifetime is correctly managed: all successful open paths are closed once, and recursive win32_xstat_slow_impl(path, true) calls close h_file before recursing, avoiding leaks/double‑closes.

Overall this is a strong, careful port; I don’t see functional bugs in the flow.

It’d be worth running the existing CPython Windows stat/xstat test suite (or RustPython’s equivalent) against this implementation on real Windows targets, particularly cases involving:

  • Junctions and non‑symlink reparse points,
  • Unhandled tags with traverse=True/False,
  • Special paths like \\.\CON and named pipes.

Comment on lines +134 to +227
fn attribute_data_to_stat(
info: &windows_sys::Win32::Storage::FileSystem::BY_HANDLE_FILE_INFORMATION,
reparse_tag: u32,
basic_info: Option<&windows_sys::Win32::Storage::FileSystem::FILE_BASIC_INFO>,
id_info: Option<&windows_sys::Win32::Storage::FileSystem::FILE_ID_INFO>,
) -> StatStruct {
use crate::common::fileutils::windows::SECS_BETWEEN_EPOCHS;
use windows_sys::Win32::Storage::FileSystem::FILE_ATTRIBUTE_REPARSE_POINT;

let mut st_mode = attributes_to_mode(info.dwFileAttributes);
let st_size = ((info.nFileSizeHigh as u64) << 32) | (info.nFileSizeLow as u64);
let st_dev = id_info
.map(|id| id.VolumeSerialNumber as u32)
.unwrap_or(info.dwVolumeSerialNumber);
let st_nlink = info.nNumberOfLinks as i32;

// Convert FILETIME/LARGE_INTEGER to (time_t, nsec)
let filetime_to_time = |ft_low: u32, ft_high: u32| -> (libc::time_t, i32) {
let ticks = ((ft_high as i64) << 32) | (ft_low as i64);
let nsec = ((ticks % 10_000_000) * 100) as i32;
let sec = (ticks / 10_000_000 - SECS_BETWEEN_EPOCHS) as libc::time_t;
(sec, nsec)
};

let large_integer_to_time = |li: i64| -> (libc::time_t, i32) {
let nsec = ((li % 10_000_000) * 100) as i32;
let sec = (li / 10_000_000 - SECS_BETWEEN_EPOCHS) as libc::time_t;
(sec, nsec)
};

let (st_birthtime, st_birthtime_nsec);
let (st_mtime, st_mtime_nsec);
let (st_atime, st_atime_nsec);

if let Some(bi) = basic_info {
(st_birthtime, st_birthtime_nsec) = large_integer_to_time(bi.CreationTime);
(st_mtime, st_mtime_nsec) = large_integer_to_time(bi.LastWriteTime);
(st_atime, st_atime_nsec) = large_integer_to_time(bi.LastAccessTime);
} else {
(st_birthtime, st_birthtime_nsec) = filetime_to_time(
info.ftCreationTime.dwLowDateTime,
info.ftCreationTime.dwHighDateTime,
);
(st_mtime, st_mtime_nsec) = filetime_to_time(
info.ftLastWriteTime.dwLowDateTime,
info.ftLastWriteTime.dwHighDateTime,
);
(st_atime, st_atime_nsec) = filetime_to_time(
info.ftLastAccessTime.dwLowDateTime,
info.ftLastAccessTime.dwHighDateTime,
);
}

// Get file ID from id_info or fallback to file index
let (st_ino, st_ino_high) = if let Some(id) = id_info {
// FILE_ID_INFO.FileId is FILE_ID_128 which is [u8; 16]
let bytes = id.FileId.Identifier;
let low = u64::from_le_bytes(bytes[0..8].try_into().unwrap());
let high = u64::from_le_bytes(bytes[8..16].try_into().unwrap());
(low, high)
} else {
let ino = ((info.nFileIndexHigh as u64) << 32) | (info.nFileIndexLow as u64);
(ino, 0u64)
};

// Set symlink mode if applicable
if info.dwFileAttributes & FILE_ATTRIBUTE_REPARSE_POINT != 0
&& reparse_tag == IO_REPARSE_TAG_SYMLINK
{
st_mode = (st_mode & !S_IFMT) | S_IFLNK;
}

StatStruct {
st_dev,
st_ino,
st_ino_high,
st_mode,
st_nlink,
st_uid: 0,
st_gid: 0,
st_rdev: 0,
st_size,
st_atime,
st_atime_nsec,
st_mtime,
st_mtime_nsec,
st_ctime: 0, // Will be set by caller
st_ctime_nsec: 0,
st_birthtime,
st_birthtime_nsec,
st_file_attributes: info.dwFileAttributes,
st_reparse_tag: reparse_tag,
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

CPython fileutils.c _Py_attribute_data_to_stat Windows implementation Python 3.12

💡 Result:

Summary — what CPython 3.12 does on Windows in fileutils.c for _Py_attribute_data_to_stat

  • Purpose: convert Windows file attribute structures into CPython's internal struct _Py_stat_struct so os.stat()/lstat()/DirEntry.stat() etc. can return Python stat results on Windows. [1], [2].

  • Inputs (Windows): BY_HANDLE_FILE_INFORMATION *info, ULONG reparse_tag, FILE_BASIC_INFO *basic_info, FILE_ID_INFO *id_info. Output: fills struct _Py_stat_struct *result. [1], [2].

  • Key fields set:

    • st_mode: derived from Windows FILE_ATTRIBUTE_* via attributes_to_mode() (sets S_IFDIR/S_IFREG and permission bits). [2]
    • st_size: combined from info->nFileSizeHigh and info->nFileSizeLow into a 64-bit size. [2]
    • st_ino: taken from FileIdInfo (FILE_ID_INFO / FILE_ID_128) when available; otherwise from fileInfo.nFileIndexHigh/nFileIndexLow (handled as 64-bit inode). (CPython has historically fixed signed/unsigned inode casting issues.) [2], [3]
    • st_nlink: set from BY_HANDLE_FILE_INFORMATION (nNumberOfLinks) when available (or default 1). [2]
    • times (atime/mtime/ctime / st_atime_ns etc.): converted from FILE_BASIC_INFO / FILETIME to time_t / nanoseconds using internal helpers (e.g. _Py_time_t_to_FILE_TIME / conversions). [1], [2]
    • Windows-specific fields: st_file_attributes (dwFileAttributes) and st_reparse_tag get populated when supported. [1], [2]
  • Behavior notes:

    • If FileIdInfo is not available, CPython falls back to file index fields. The implementation handles 64-bit sizes/inos to avoid overflow issues. [2], [3]
    • The function is declared in Modules/posixmodule.c and implemented in Python/fileutils.c on Windows builds. [1], [2]

Sources

  • declaration and usage in Modules/posixmodule.c (Windows-specific stat helpers). [1]
  • implementation of _Py_attribute_data_to_stat in Python/fileutils.c (code that sets st_mode, st_size, st_ino, times, st_file_attributes, st_reparse_tag, etc.). [2]
  • Python issue discussion about inode signed/unsigned handling (relevant to how st_ino is handled). [3]

References
[1] Modules/posixmodule.c — CPython (Windows declarations/usages).
[2] Python/fileutils.c — implementation of _Py_attribute_data_to_stat.
[3] bpo-29619 discussion / commit about st_ino unsigned conversion.


🏁 Script executed:

# First, let's examine the RustPython implementation more closely
fd -t f "windows.rs" --path "*/vm/src/*" | head -5

Repository: RustPython/RustPython

Length of output: 186


🌐 Web query:

CPython Python/fileutils.c st_ctime st_birthtime _Py_attribute_data_to_stat 3.12 source code

💡 Result:

Do you want the full source of fileutils.c (Windows/PC or the POSIX one), or just the implementation of _Py_attribute_data_to_stat and related st_ctime / st_birthtime handling from CPython 3.12? I can fetch and paste the exact code with file/line citations.


🌐 Web query:

CPython os.stat_result fields Windows FILE_ATTRIBUTE handling 3.12

💡 Result:

Short summary (CPython 3.12)

  • os.stat_result on Windows includes these Windows-specific members:
    • st_file_attributes — the raw dwFileAttributes (DWORD) from BY_HANDLE_FILE_INFORMATION. Use the FILE_ATTRIBUTE_* constants in the stat module to test bits. [1][2]
    • st_reparse_tag — present when FILE_ATTRIBUTE_REPARSE_POINT is set; contains the reparse tag (IO_REPARSE_TAG_*). [1]
  • Cross‑platform/Unix-compatible members (st_mode, st_size, st_atime, st_mtime, etc.) remain present; when accessed as a tuple the first 10 entries follow the traditional order. [1]
  • Platform differences/notes on Windows:
    • st_ino, st_dev and st_nlink are set to zero by the Windows implementation (use os.stat() for best available values). [1]
    • In Python 3.12 the st_ctime attribute is deprecated on Windows; the creation time is exposed as st_birthtime. st_ctime may be changed in future. [1]
  • Use the stat module’s FILE_ATTRIBUTE_* constants (e.g. stat.FILE_ATTRIBUTE_ARCHIVE, FILE_ATTRIBUTE_REPARSE_POINT, FILE_ATTRIBUTE_HIDDEN, FILE_ATTRIBUTE_READONLY, etc.) to inspect st_file_attributes. [3]

Sources:

  • os.stat_result and Windows-specific fields (Python 3.12 docs). [1]
  • os.stat Windows notes (st_ino/st_dev/st_nlink zero; st_ctime deprecation). [1]
  • stat module FILE_ATTRIBUTE_* constant list. [3]

References
[1] Python 3.12 os — Miscellaneous operating system interfaces documentation.
[2] (within same os docs) description of st_reparse_tag and st_file_attributes.
[3] Python stat module documentation listing FILE_ATTRIBUTE_* constants.


🌐 Web query:

CPython 3.12 Windows st_ino st_dev st_nlink zero why design decision fileutils.c

💡 Result:

Short answer: CPython on Windows uses whatever file metadata the platform/filesystem APIs provide, and many Windows filesystems or providers (network mounts, WebDAV/SharePoint, some UNC/shadow copies, FUSE-like drivers) do not expose POSIX-style device/inode/link-count values — so CPython populates st_ino, st_dev and st_nlink with 0 (dummy values) when the underlying API doesn't give meaningful numbers. This is a deliberate compatibility/portability choice in the Windows codepaths (and in the samefile/sameopenfile logic): treat zero-valued fields as “not available” and fall back to safer path/handle-based comparisons rather than trusting those fields. [1][2][3]

Why this design:

  • Windows doesn’t guarantee POSIX inode/device semantics across all filesystems; some filesystems return no useful file index/device info, so returning 0 is the clean way to indicate “unavailable.” The docs note that some stat members are dummy on Windows and that st_ino may be up to 128 bits when available. [1]
  • The Python Windows implementation then uses handle-based final-path comparisons (GetFinalPathNameByHandle and related logic) or other fallbacks when st_dev/st_ino are zero to avoid incorrect equality tests. The Python issue tracker discussion and patches show this behavior and the rationale (special-casing zero-valued fields and falling back to final NT path/handle checks). [2]
  • This is a common cross-platform problem (e.g. C++ std::filesystem also had to special-case “no inode” Windows filesystems); libraries prefer explicit fallbacks over inventing fake inode/device values. [3]

References

  • Python os.stat / Windows notes (3.12 docs): some stat members are dummy on Windows; st_ino changes noted. [1]
  • Python issue discussion and implementation notes about samefile/sameopenfile and zero-valued st_dev/st_ino fallbacks (issue 33935 and related tickets). [2]
  • Similar reasoning in other projects (example: libstdc++ handling of missing inode numbers on Windows). [3]

attribute_data_to_stat deviates from CPython's intentional Windows behavior for st_ino, st_dev, and st_nlink.

CPython 3.12 deliberately sets st_ino, st_dev, and st_nlink to zero on Windows to indicate "not available," because Windows filesystems do not guarantee POSIX inode/device semantics. This is a deliberate design choice: CPython uses handle-based fallbacks and final-path comparisons when these fields are zero, rather than trusting potentially meaningless values from different filesystems (network mounts, WebDAV, UNC paths, etc.).

RustPython's implementation populates:

  • st_dev from VolumeSerialNumber
  • st_ino from FILE_ID_INFO or file index fields
  • st_nlink from nNumberOfLinks

This diverges from CPython's approach and may cause code relying on samefile()-like logic to behave unexpectedly compared to CPython on the same Windows filesystem. Set these three fields to zero instead, matching CPython 3.12's Windows conventions.

@youknowone youknowone merged commit 876368e into RustPython:main Dec 8, 2025
13 checks passed
@youknowone youknowone deleted the nt branch December 8, 2025 01:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant