Skip to content

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Dec 8, 2025

Summary by CodeRabbit

  • Breaking Changes

    • DirEntry and ScandirIterator objects can no longer be directly instantiated; they are now system-generated only.
  • Bug Fixes

    • Improved detection and handling of Windows directory symlinks and junctions in file operations.
  • Refactor

    • Consolidated Windows path string handling across modules for consistency and maintainability.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 8, 2025

Walkthrough

This PR consolidates Windows string encoding across multiple stdlib modules by replacing scattered OsStrExt::encode_wide() calls with a centralized ToWideString trait from crate::common::windows. Additionally, DirEntry and ScandirIterator are marked as Unconstructible to prevent direct construction, and Windows-specific symlink detection is added.

Changes

Cohort / File(s) Summary
Windows wide string encoding refactoring
crates/common/src/fileutils.rs, crates/vm/src/stdlib/codecs.rs, crates/vm/src/stdlib/nt.rs, crates/vm/src/stdlib/sys.rs, crates/vm/src/windows.rs
Replaces direct OsStrExt::encode_wide() usage with ToWideString trait methods (to_wide() and to_wide_with_nul()), removing scattered Windows-specific encoding logic in favor of a centralized utility. No functional changes; identical outcomes with simplified code.
OS module constructibility and symlink handling
crates/vm/src/stdlib/os.rs
Marks DirEntry and ScandirIterator as Unconstructible to prevent direct construction; adds Windows-specific is_dir_link() for improved symlink detection using GetFileAttributesW; imports ToWideString for consistent wide string handling.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Multiple files affected with mostly straightforward method replacements
  • crates/vm/src/stdlib/os.rs introduces new trait implementations and Windows-specific logic that warrants closer inspection
  • Verify that to_wide() and to_wide_with_nul() are correctly applied in each context (string termination requirements vary across Windows API calls)
  • Confirm is_dir_link() logic correctly handles directory symlinks and junctions

Possibly related PRs

  • windows codecs #6337 — Directly modifies the same mbcs_encode and oem_encode functions in codecs.rs with identical OsStrExt to ToWideString refactoring.

Suggested reviewers

  • ShaharNaveh

Poem

🐰 Wide strings once scattered, now unified with care,
One trait to encode them all, from here to there,
DirEntry and Scanner, now locked from careless hands,
Symlinks detected true on Windows lands!
~The Code-Tending Rabbit 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Fix os.remove' is vague and does not clearly describe the substantive changes made across multiple files to refactor Windows string handling and enhance directory unlinking behavior. Provide a more specific title that describes the main architectural change, such as 'Refactor Windows path encoding to use ToWideString trait' or 'Improve os.remove with Windows directory symlink detection'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a3d638a and 06c4bf7.

⛔ Files ignored due to path filters (2)
  • Lib/test/test_os.py is excluded by !Lib/**
  • Lib/test/test_shutil.py is excluded by !Lib/**
📒 Files selected for processing (6)
  • crates/common/src/fileutils.rs (1 hunks)
  • crates/vm/src/stdlib/codecs.rs (4 hunks)
  • crates/vm/src/stdlib/nt.rs (2 hunks)
  • crates/vm/src/stdlib/os.rs (7 hunks)
  • crates/vm/src/stdlib/sys.rs (2 hunks)
  • crates/vm/src/windows.rs (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/common/src/fileutils.rs
  • crates/vm/src/windows.rs
  • crates/vm/src/stdlib/nt.rs
  • crates/vm/src/stdlib/codecs.rs
  • crates/vm/src/stdlib/sys.rs
  • crates/vm/src/stdlib/os.rs
🧠 Learnings (1)
📚 Learning: 2025-11-29T12:17:28.606Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-29T12:17:28.606Z
Learning: Applies to **/*.rs : Use the macro system (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust

Applied to files:

  • crates/vm/src/stdlib/os.rs
🧬 Code graph analysis (1)
crates/vm/src/stdlib/os.rs (2)
crates/derive/src/lib.rs (1)
  • pyclass (124-128)
crates/derive-impl/src/lib.rs (1)
  • pyclass (41-47)
🔇 Additional comments (14)
crates/vm/src/stdlib/nt.rs (2)

18-18: LGTM!

The import of ToWideString from common::windows consolidates Windows wide-string handling, aligning with the broader refactor across the codebase.


540-540: LGTM!

The switch from encode_wide().collect() to to_wide() uses the centralized ToWideString trait, improving consistency across the Windows-specific code paths.

crates/vm/src/stdlib/codecs.rs (2)

240-240: LGTM!

The refactor to use ToWideString::to_wide() in mbcs_encode is consistent with the centralized Windows wide-string utilities.

Also applies to: 262-262


442-442: LGTM!

Consistent application of ToWideString::to_wide() in oem_encode, matching the pattern established in mbcs_encode.

Also applies to: 464-464

crates/common/src/fileutils.rs (2)

26-26: LGTM!

The refactor to use ToWideString::to_wide() for file extension parsing maintains the same behavior while using the centralized utility.

Also applies to: 77-77


306-306: LGTM!

Correct use of to_wide_with_nul() for Windows API calls that require null-terminated wide strings.

Also applies to: 333-333

crates/vm/src/windows.rs (2)

10-10: LGTM!

Import of ToWideString aligns with the centralized Windows wide-string handling approach.


244-244: LGTM!

Correct use of to_wide_with_nul() for Windows API calls (FindFirstFileW, CreateFileW) that require null-terminated wide strings.

Also applies to: 289-289

crates/vm/src/stdlib/sys.rs (2)

552-552: LGTM!

Correct use of to_wide_with_nul() for GetModuleHandleW which requires a null-terminated wide string for the module name.

Also applies to: 555-555


592-592: LGTM!

Correct use of to_wide_with_nul() for VerQueryValueW sub-block parameter.

crates/vm/src/stdlib/os.rs (4)

155-156: LGTM!

The imports for ToWideString (Windows-only) and Unconstructible are correctly scoped and support the changes in this file.

Also applies to: 173-173


297-324: Core fix for os.remove on Windows - correctly handles directory symlinks/junctions.

This change properly distinguishes between regular files and directory symlinks/junctions on Windows:

  • Uses GetFileAttributesW which doesn't follow symlinks
  • Checks for both FILE_ATTRIBUTE_DIRECTORY and FILE_ATTRIBUTE_REPARSE_POINT flags
  • Calls remove_dir for directory symlinks/junctions (matching CPython's Py_DeleteFileW behavior)
  • Falls through to remove_file for regular files and file symlinks

This aligns with CPython's handling where directory junctions and directory symlinks require RemoveDirectoryW instead of DeleteFileW.


498-498: LGTM!

Adding Unconstructible to DirEntry correctly prevents direct instantiation from Python code, matching CPython behavior where os.DirEntry cannot be constructed directly.

Also applies to: 676-676


686-686: LGTM!

Adding Unconstructible to ScandirIterator correctly prevents direct instantiation from Python code.

Also applies to: 704-704


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@youknowone youknowone marked this pull request as ready for review December 8, 2025 16:47
@fanninpm
Copy link
Contributor

fanninpm commented Dec 8, 2025

Does this fix support.rmtree()? If so, we can likely re-enable test_pathlib on the Windows CI.

@youknowone
Copy link
Member Author

Good point. Unfortunately not yet, but it is about another error related to surrogate

@youknowone youknowone merged commit 42d0a58 into RustPython:main Dec 8, 2025
13 checks passed
@youknowone youknowone deleted the remove branch December 8, 2025 17:44
@fanninpm
Copy link
Contributor

fanninpm commented Dec 8, 2025

Good point. Unfortunately not yet, but it is about another error related to surrogate

It would be a good idea to mention that in .github/workflows/ci.yaml.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants