-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Move __doc__ crate to crates/doc
#6234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis PR adds a new workspace crate Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant GH as GitHub Actions
participant Gen as Generate Job (multi-OS)
participant Merge as Merge Job
participant Artifact as src/data.inc.rs
participant Build as Cargo Build / crate compile
participant Macro as derive-impl runtime (compile-time include)
GH->>Gen: workflow_dispatch / push triggers
Gen->>Gen: run crates/doc/generate.py per OS
Gen-->>GH: upload platform JSON artifacts
GH->>Merge: start after Gen completes
Merge->>Merge: download artifacts & merge -> merged.json
Merge->>Merge: format merged -> phf map source
Merge->>Artifact: create src/data.inc.rs (DB)
Merge-->>GH: upload Rust source artifact
Build->>Artifact: include!("./data.inc.rs") at compile-time
Macro->>Macro: derive-impl calls DB.get("module.name")
Macro-->>Build: doc string included in generated code
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yml Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧠 Learnings (2)📚 Learning: 2025-07-11T09:35:43.571ZApplied to files:
📚 Learning: 2025-07-11T09:35:43.571ZApplied to files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
🔇 Additional comments (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
.github/workflows/update-doc-db.yml
Outdated
| if-no-files-found: error | ||
| retention-days: 7 | ||
| overwrite: true | ||
| pr: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ain't sure if we want to open a PR, or just push the generated artifacts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to update it only when target Python version is changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so probably not worth the effort in automating a PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
crates/rustpython_doc_db/src/lib.rs (1)
10-19: Consider expanding test coverage.The single test validates basic DB functionality, but additional tests would strengthen confidence in the generated database. Consider adding tests for:
- Multiple module paths (builtins, stdlib modules)
- Non-existent keys returning None
- Documentation content format validation
crates/rustpython_doc_db/generate.py (1)
36-36: Replace Python 3.12+ type statement with backwards-compatible syntax.The
typestatement for type aliases was introduced in Python 3.12. While the workflow specifies Python 3.13.9, using a more compatible syntax ensures the script works with older Python versions.Apply this diff:
-type Parts = tuple[str, ...] +Parts = tuple[str, ...]Alternatively, if type hints are important, use
typing.TypeAlias:+from typing import TypeAlias + -type Parts = tuple[str, ...] +Parts: TypeAlias = tuple[str, ...].github/workflows/update-doc-db.yml (1)
56-56: Intentionally disabled PR job with confusing condition.The condition
if: false && ${{ inputs.open_pr }}is always false due to the leadingfalse &&. While this effectively disables the job, it's confusing. Consider using a clearer approach:- if: false && ${{ inputs.open_pr }} + if: false # TODO: Re-enable when PR automation is readyOr remove the job entirely until it's ready to be used.
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
-
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (11)
-
.gitattributes(1 hunks) -
.github/workflows/update-doc-db.yml(1 hunks) -
Cargo.toml(3 hunks) -
crates/rustpython_doc_db/Cargo.toml(1 hunks) -
crates/rustpython_doc_db/LICENSE(1 hunks) -
crates/rustpython_doc_db/generate.py(1 hunks) -
crates/rustpython_doc_db/src/lib.rs(1 hunks) -
derive-impl/Cargo.toml(2 hunks) -
derive-impl/src/lib.rs(0 hunks) -
derive-impl/src/pyclass.rs(2 hunks) -
derive-impl/src/pymodule.rs(3 hunks)
💤 Files with no reviewable changes (1)
- derive-impl/src/lib.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.rs: Format Rust code with the default rustfmt style (runcargo fmt)
Run clippy and fix any warnings or lints introduced by your changes
Follow Rust best practices for error handling and memory management
Files:
derive-impl/src/pyclass.rsderive-impl/src/pymodule.rscrates/rustpython_doc_db/src/lib.rs
🧠 Learnings (9)
📓 Common learnings
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: Applies to Lib/**/*.py : Minimize modifications to CPython standard library files; only make minimal changes to work around RustPython limitations
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: Applies to {vm,stdlib}/**/*.rs : Use RustPython macros (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust
📚 Learning: 2025-09-11T05:06:59.971Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: Applies to {vm,stdlib}/**/*.rs : Use RustPython macros (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust
Applied to files:
derive-impl/src/pyclass.rsderive-impl/src/pymodule.rscrates/rustpython_doc_db/src/lib.rscrates/rustpython_doc_db/Cargo.tomlCargo.tomlcrates/rustpython_doc_db/generate.pyderive-impl/Cargo.toml.gitattributes
📚 Learning: 2025-09-11T05:06:59.971Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: Applies to Lib/**/*.py : Minimize modifications to CPython standard library files; only make minimal changes to work around RustPython limitations
Applied to files:
derive-impl/src/pyclass.rsderive-impl/src/pymodule.rscrates/rustpython_doc_db/src/lib.rscrates/rustpython_doc_db/Cargo.tomlCargo.tomlcrates/rustpython_doc_db/generate.pyderive-impl/Cargo.toml.gitattributes
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.
Applied to files:
crates/rustpython_doc_db/src/lib.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.
Applied to files:
crates/rustpython_doc_db/src/lib.rs
📚 Learning: 2025-08-09T22:56:24.527Z
Learnt from: jackoconnordev
Repo: RustPython/RustPython PR: 6086
File: benches/microbenchmarks/sort.py:1-3
Timestamp: 2025-08-09T22:56:24.527Z
Learning: In RustPython's microbenchmarks (benches/microbenchmarks/*.py), the variable `ITERATIONS` is intentionally used without being defined in the Python files. It is injected by the cargo bench harness at runtime. This pattern should be maintained for consistency across all microbenchmarks, and F821 lint warnings for undefined `ITERATIONS` are expected and acceptable in this context.
Applied to files:
Cargo.toml
📚 Learning: 2025-09-11T05:06:59.971Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: Applies to **/*.rs : Run clippy and fix any warnings or lints introduced by your changes
Applied to files:
derive-impl/Cargo.toml.gitattributes
📚 Learning: 2025-09-11T05:06:59.971Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: Applies to **/*.rs : Format Rust code with the default rustfmt style (run `cargo fmt`)
Applied to files:
.gitattributes
📚 Learning: 2025-09-11T05:06:59.971Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: Applies to **/*.rs : Follow Rust best practices for error handling and memory management
Applied to files:
.gitattributes
🧬 Code graph analysis (1)
crates/rustpython_doc_db/generate.py (3)
Lib/platform.py (2)
platform(1193-1265)python_version(1125-1133)Lib/importlib/_bootstrap_external.py (1)
ExtensionFileLoader(1267-1317)Lib/warnings.py (1)
catch_warnings(442-515)
🪛 actionlint (1.7.8)
.github/workflows/update-doc-db.yml
56-56: if: condition "false && ${{ inputs.open_pr }}" is always evaluated to true because extra characters are around ${{ }}
(if-cond)
🪛 Flake8 (7.3.0)
crates/rustpython_doc_db/generate.py
[error] 36-36: SyntaxError: invalid syntax
(E999)
🔇 Additional comments (19)
.gitattributes (1)
8-8: LGTM!Correctly marks the generated PHF map files as linguist-generated, which is appropriate for code-generated artifacts.
crates/rustpython_doc_db/LICENSE (1)
1-277: LGTM!The Python Software Foundation License is appropriate for a crate containing documentation derived from Python's standard library.
crates/rustpython_doc_db/src/lib.rs (1)
1-8: LGTM!Platform-specific conditional compilation is correctly structured. The approach ensures the appropriate documentation database is included for each target OS.
Cargo.toml (3)
76-78: LGTM!Setting
codegen-units = 1for the doc database in release builds is a good optimization. Since this crate changes infrequently and contains a large static PHF map, maximizing optimization at the cost of longer compile times is appropriate.
139-139: LGTM!Using the wildcard pattern
crates/*for workspace members is cleaner than listing individual crates and allows for easier future additions.
164-164: LGTM!The rustpython_doc_db and phf dependencies are correctly configured. PHF 0.13.1 with macros feature is appropriate for the perfect hash function implementation.
Also applies to: 171-171
crates/rustpython_doc_db/generate.py (3)
44-49: Verify UNICODE_ESCAPE regex pattern behavior.The regex replacement at line 45 appears to convert
\uNNNNto\u{NNNN}format, but the patternr"\\u([0-9]+)"will match any sequence of digits (not limited to 4). This could cause issues if the input contains sequences like\u123456.Consider whether the pattern should be more specific:
-UNICODE_ESCAPE = re.compile(r"\\u([0-9]+)") +UNICODE_ESCAPE = re.compile(r"\\u([0-9a-fA-F]{4})")Or if extended unicode is intended:
-UNICODE_ESCAPE = re.compile(r"\\u([0-9]+)") +UNICODE_ESCAPE = re.compile(r"\\u([0-9a-fA-F]+)")
93-109: LGTM!Properly suppresses
DeprecationWarningduring module imports and handles import failures gracefully with warnings. This is appropriate for scanning stdlib modules where some may be deprecated or platform-specific.
200-226: LGTM!The main function correctly generates a Rust PHF map with proper formatting, headers, and platform-specific output paths. The sorted output ensures deterministic generation.
.github/workflows/update-doc-db.yml (2)
31-53: LGTM!The generation job correctly runs on all major platforms (Windows, Linux, macOS), executes the Python generator script, detects output changes, and uploads artifacts with appropriate retention settings.
19-19: The review comment is incorrect—Python 3.13.9 is valid.Python 3.13.9 is the latest Python 3.13 release, released October 14, 2025. The specified version in the workflow is correct and no changes are needed.
Likely an incorrect or invalid review comment.
derive-impl/src/pyclass.rs (2)
9-9: LGTM!Correctly imports the centralized documentation database.
317-322: LGTM!The migration to the centralized documentation database is correctly implemented. The
DB.get(&format!(...)).copied().map(str::to_owned)pattern properly handles the lookup and ownership transfer.crates/rustpython_doc_db/Cargo.toml (1)
13-14: Remove or update the unnecessary doctest disable.The crate contains no documentation comments or doctests—the
doctest = falsesetting serves no purpose. The comment "Crashes when true" is unsubstantiated. Either remove the setting or replace the comment with accurate context if there's a specific reason for keeping it disabled.Likely an incorrect or invalid review comment.
derive-impl/Cargo.toml (2)
14-14: LGTM!The dependency migration from
rustpython-doctorustpython_doc_dbcorrectly reflects the PR's objective to move the documentation database into the monorepo.
26-26: LGTM!The lint configuration is now correctly inheriting from the workspace.
derive-impl/src/pymodule.rs (3)
9-9: LGTM!The import of
DBfromrustpython_doc_dbcorrectly provides access to the static documentation database.
104-104: LGTM!The migration to the static DB lookup is correctly implemented. The pattern
DB.get(module_name).copied().map(str::to_owned)appropriately converts the PHF map'sOption<&&'static str>toOption<String>, and the fallback chain ensures code-level documentation takes precedence.
461-466: LGTM!The item-level documentation lookup correctly migrates to the DB-based approach. The key format
"{module}.{py_name}"follows Python's standard naming convention, and the fallback logic preserves the precedence of code-level documentation.The
format!allocation on each lookup is a minor cost during proc-macro expansion, which is acceptable given this code runs at compile time rather than runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/update-doc-db.yml(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: Applies to Lib/**/*.py : Minimize modifications to CPython standard library files; only make minimal changes to work around RustPython limitations
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: Applies to {vm,stdlib}/**/*.rs : Use RustPython macros (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust
📚 Learning: 2025-07-10T10:08:43.330Z
Learnt from: ShaharNaveh
Repo: RustPython/RustPython PR: 5932
File: .github/workflows/comment-commands.yml:18-24
Timestamp: 2025-07-10T10:08:43.330Z
Learning: In GitHub Actions workflows for the RustPython project, the maintainer ShaharNaveh prefers to keep workflows simple and doesn't mind if steps fail when the desired state is already achieved (e.g., user already assigned to an issue). Avoid suggesting complex error handling for edge cases they don't consider problematic.
Applied to files:
.github/workflows/update-doc-db.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Run snippets and cpython tests on wasm-wasi
- GitHub Check: Run snippets and cpython tests (windows-latest)
- GitHub Check: Run snippets and cpython tests (macos-latest)
- GitHub Check: Run snippets and cpython tests (ubuntu-latest)
- GitHub Check: Run tests under miri
- GitHub Check: Check Rust code with rustfmt and clippy
- GitHub Check: Run rust tests (macos-latest)
- GitHub Check: Run rust tests (ubuntu-latest)
- GitHub Check: Run rust tests (windows-latest)
- GitHub Check: Ensure compilation on various targets
- GitHub Check: Check the WASM package and demo
🔇 Additional comments (1)
.github/workflows/update-doc-db.yml (1)
31-32: Sparse-checkout is sufficient for generate.py.Verification confirms that
generate.pyis completely self-contained withincrates/rustpython_doc_dband does not require access to files outside this directory. The script:
- Anchors all operations to
CRATE_DIR(the script's parent directory)- Writes output exclusively to
CRATE_DIR / "src" / *.inc.rs- Contains no file read operations, config file loading, or external data dependencies
- Uses only standard library imports with no workspace or sibling crate references
The workflow's sparse-checkout configuration is correct and sufficient.
.github/workflows/update-doc-db.yml
Outdated
| - name: Generate (${{ matrix.os }}) | ||
| run: python crates/rustpython_doc_db/generate.py | ||
|
|
||
| - name: Detect output file | ||
| id: diff | ||
| run: echo "diff=$(git diff --name-only)" >> $GITHUB_OUTPUT | ||
|
|
||
| - uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0 | ||
| with: | ||
| name: doc-db-${{ env.PYTHON_VERSION }}-${{ matrix.os }} | ||
| path: ${{ steps.diff.outputs.diff }} | ||
| if-no-files-found: error | ||
| retention-days: 7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Major: Workflow fails if generate script produces no changes.
The workflow detects output via git diff --name-only (line 43) and uploads artifacts with if-no-files-found: error (line 49). If the generated output is already up-to-date (no changes), the job fails, which contradicts the use case where this workflow may be idempotent.
Additionally, line 39 runs the generate script with no explicit error handling—if it fails, the workflow continues to the artifact upload step.
Consider:
- For idempotent behavior: Use
if-no-files-found: warninstead oferrorto allow the workflow to complete gracefully when no changes are detected. - For explicit error handling: Add
set -eor explicit checks after the generate step.
Apply this diff:
- name: Generate (${{ matrix.os }})
run: python crates/rustpython_doc_db/generate.py
- name: Detect output file
id: diff
run: echo "diff=$(git diff --name-only)" >> $GITHUB_OUTPUT
- uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
with:
name: doc-db-${{ env.PYTHON_VERSION }}-${{ matrix.os }}
path: ${{ steps.diff.outputs.diff }}
- if-no-files-found: error
+ if-no-files-found: warnBased on learnings, the maintainers prefer simplicity and accept workflows that may fail gracefully in edge cases; however, the intent here is unclear.
🤖 Prompt for AI Agents
.github/workflows/update-doc-db.yml lines 38-50: the workflow will fail when the
generate script produces no git-diff (upload-artifact uses if-no-files-found:
error) and also doesn't stop early if the generate script itself fails; change
the upload step to use if-no-files-found: warn to make idempotent runs succeed
with no changes, and make the generate step fail fast by running the script
under a strict shell (e.g., add set -e before running the python command or
check the script exit status and exit non-zero on failure) so the job correctly
surfaces generation errors.
|
Looks great, I have one more question. Is it possible to merge 3 inc files into a single file? They have subsets but can be merged into a single big file. Then we don't need to worry about missing architecture when building |
Yeah sure, we can do that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
crates/rustpython_doc_db/src/lib.rs (1)
7-11: Consider testing DB existence more generally.The test currently hardcodes a specific key (
"array._array_reconstructor"). If the generated data changes or this entry is removed, the test will break. Consider either:
- Testing that the DB is not empty:
assert!(!DB.is_empty());- Testing multiple well-known entries to be more robust
#[test] fn test_db() { - let doc = DB.get("array._array_reconstructor"); - assert!(doc.is_some()); + // Verify DB contains documentation entries + assert!(!DB.is_empty(), "DB should contain documentation entries"); + + // Verify at least one known entry exists + assert!( + DB.get("builtins.dict").is_some() || DB.get("builtins.list").is_some(), + "DB should contain common builtin types" + ); }
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
.github/workflows/update-doc-db.yml(1 hunks)crates/rustpython_doc_db/.gitignore(1 hunks)crates/rustpython_doc_db/generate.py(1 hunks)crates/rustpython_doc_db/src/lib.rs(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- crates/rustpython_doc_db/.gitignore
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/update-doc-db.yml
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.rs: Format Rust code with the default rustfmt style (runcargo fmt)
Run clippy and fix any warnings or lints introduced by your changes
Follow Rust best practices for error handling and memory management
Files:
crates/rustpython_doc_db/src/lib.rs
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: Applies to {vm,stdlib}/**/*.rs : Use RustPython macros (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust
📚 Learning: 2025-09-11T05:06:59.971Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: Applies to Lib/**/*.py : Minimize modifications to CPython standard library files; only make minimal changes to work around RustPython limitations
Applied to files:
crates/rustpython_doc_db/src/lib.rscrates/rustpython_doc_db/generate.py
📚 Learning: 2025-09-11T05:06:59.971Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: Applies to {vm,stdlib}/**/*.rs : Use RustPython macros (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust
Applied to files:
crates/rustpython_doc_db/src/lib.rs
🧬 Code graph analysis (1)
crates/rustpython_doc_db/generate.py (2)
Lib/importlib/_bootstrap_external.py (1)
ExtensionFileLoader(1267-1317)Lib/warnings.py (1)
catch_warnings(442-515)
🪛 Flake8 (7.3.0)
crates/rustpython_doc_db/generate.py
[error] 37-37: SyntaxError: invalid syntax
(E999)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Run snippets and cpython tests (windows-latest)
- GitHub Check: Run snippets and cpython tests (ubuntu-latest)
- GitHub Check: Run snippets and cpython tests (macos-latest)
- GitHub Check: Check Rust code with rustfmt and clippy
- GitHub Check: Ensure compilation on various targets
- GitHub Check: Run snippets and cpython tests on wasm-wasi
- GitHub Check: Run rust tests (macos-latest)
- GitHub Check: Run rust tests (windows-latest)
- GitHub Check: Run rust tests (ubuntu-latest)
- GitHub Check: Check the WASM package and demo
🔇 Additional comments (6)
crates/rustpython_doc_db/src/lib.rs (1)
1-1: LGTM: Appropriate use of include! for generated code.The
include!macro is the standard approach for incorporating generated code in Rust.crates/rustpython_doc_db/generate.py (5)
1-34: LGTM: Imports and constants are well-structured.Good use of
pathliband proper handling of the output directory creation.
53-92: LGTM: Robust C extension detection logic.The function uses multiple fallback approaches to reliably detect C extensions.
94-121: LGTM: Proper warning handling and module iteration.Good use of
warnings.catch_warnings()to suppress deprecation warnings while still reporting import failures.
176-176: Verify builtins usage across different contexts.Using
__builtins__directly can be fragile as it's adictin__main__and a module in other contexts. Whiletraverse()handles both cases, it's worth verifying this works correctly.Consider using the
builtinsmodule explicitly:+import builtins + def find_doc_entires() -> "Iterable[DocEntry]": yield from ( doc_entry for module in iter_c_modules() for doc_entry in traverse(module, module) ) - yield from (doc_entry for doc_entry in traverse(__builtins__, __builtins__)) + yield from (doc_entry for doc_entry in traverse(builtins, builtins)) builtin_types = [ # ... ] for typ in builtin_types: parts = ("builtins", typ.__name__) yield DocEntry(parts, pydoc._getowndoc(typ)) - yield from traverse(typ, __builtins__, parts) + yield from traverse(typ, builtins, parts)
37-37: ****The
typestatement syntax is appropriate for this codebase. RustPython explicitly targets Python 3.13.0+ (as stated in README.md: "A Python-3 (CPython >= 3.13.0) Interpreter written in Rust"), and the Python 3.12+typesyntax (PEP 695) is consistent with multiple other Python files in the project (scripts/lib_updater.py, Lib/typing.py, Lib/test/typinganndata/mod_generics_cache.py, and Lib/test/test_symtable.py). No compatibility issue exists.Likely an incorrect or invalid review comment.
| @property | ||
| def doc(self) -> str: | ||
| return re.sub(UNICODE_ESCAPE, r"\\u{\1}", inspect.cleandoc(self.raw_doc)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle None case in doc property.
The doc property calls inspect.cleandoc(self.raw_doc) but raw_doc can be None (as declared in line 42). This will raise an error if raw_doc is None.
Apply this diff to handle the None case:
@property
def doc(self) -> str:
- return re.sub(UNICODE_ESCAPE, r"\\u{\1}", inspect.cleandoc(self.raw_doc))
+ if self.raw_doc is None:
+ return ""
+ return re.sub(UNICODE_ESCAPE, r"\\u{\1}", inspect.cleandoc(self.raw_doc))Or change the return type to match the optional nature:
@property
- def doc(self) -> str:
+ def doc(self) -> str | None:
+ if self.raw_doc is None:
+ return None
return re.sub(UNICODE_ESCAPE, r"\\u{\1}", inspect.cleandoc(self.raw_doc))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @property | |
| def doc(self) -> str: | |
| return re.sub(UNICODE_ESCAPE, r"\\u{\1}", inspect.cleandoc(self.raw_doc)) | |
| @property | |
| def doc(self) -> str: | |
| if self.raw_doc is None: | |
| return "" | |
| return re.sub(UNICODE_ESCAPE, r"\\u{\1}", inspect.cleandoc(self.raw_doc)) |
| @property | |
| def doc(self) -> str: | |
| return re.sub(UNICODE_ESCAPE, r"\\u{\1}", inspect.cleandoc(self.raw_doc)) | |
| @property | |
| def doc(self) -> str | None: | |
| if self.raw_doc is None: | |
| return None | |
| return re.sub(UNICODE_ESCAPE, r"\\u{\1}", inspect.cleandoc(self.raw_doc)) |
🤖 Prompt for AI Agents
In crates/rustpython_doc_db/generate.py around lines 48 to 50, the doc property
calls inspect.cleandoc(self.raw_doc) but raw_doc can be None, which will raise;
update the property to handle the None case by checking if self.raw_doc is None
and returning an appropriate value (e.g., empty string) before calling
inspect.cleandoc and re.sub, or alternatively change the property return type to
Optional[str] and return None when raw_doc is None; ensure the branch avoids
calling inspect.cleandoc with None and keep the rest of the processing
unchanged.
| attr_typ = type(attr) | ||
| is_type_or_builtin = any(attr_typ is x for x in (type, type(__builtins__))) | ||
|
|
||
| if is_type_or_builtin: | ||
| yield from traverse(attr, module, new_parts) | ||
| continue | ||
|
|
||
| is_callable = ( | ||
| callable(attr) | ||
| or not issubclass(attr_typ, type) | ||
| or attr_typ.__name__ in ("getset_descriptor", "member_descriptor") | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential edge cases in type checking logic.
Two potential issues in this type-checking code:
-
Line 149:
type(__builtins__)may not work consistently.__builtins__can be adictin some contexts (e.g.,__main__) and a module in others. -
Line 157:
issubclass(attr_typ, type)can raiseTypeErrorifattr_typis not a class (e.g., if it's a special descriptor or non-type object).
Consider these improvements:
attr_typ = type(attr)
- is_type_or_builtin = any(attr_typ is x for x in (type, type(__builtins__)))
+ # Handle __builtins__ which can be dict or module
+ builtins_type = type(__builtins__) if isinstance(__builtins__, types.ModuleType) else types.ModuleType
+ is_type_or_builtin = any(attr_typ is x for x in (type, builtins_type))
if is_type_or_builtin:
yield from traverse(attr, module, new_parts)
continue
is_callable = (
callable(attr)
- or not issubclass(attr_typ, type)
+ or not (inspect.isclass(attr_typ) and issubclass(attr_typ, type))
or attr_typ.__name__ in ("getset_descriptor", "member_descriptor")
)Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In crates/rustpython_doc_db/generate.py around lines 148 to 159, the current
checks use type(__builtins__) and issubclass(attr_typ, type) which can break
when __builtins__ is a dict or attr_typ is not a class; change the logic to use
isinstance(attr, type) to detect real types instead of comparing attr_typ to
type, guard the builtin check by testing whether __builtins__ is a module (e.g.
isinstance(__builtins__, types.ModuleType)) before using its type, and replace
the raw issubclass call with a try/except TypeError (or use isinstance(attr,
type) where appropriate) so non-class attr_typs don’t raise — in short, use
isinstance checks for types, guard builtin-module handling, and catch TypeError
around issubclass to safely fall back to False.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (1)
crates/rustpython_doc_db/generate.py (1)
150-161: Guard against type-checking edge cases.Two potential runtime errors remain from the previous review:
- Line 151:
type(__builtins__)may not be consistent—__builtins__can be a dict in__main__or a module elsewhere.- Line 159:
issubclass(attr_typ, type)raisesTypeErrorifattr_typis not a class (e.g., a descriptor or special object).Consider this safer approach:
- attr_typ = type(attr) - is_type_or_builtin = any(attr_typ is x for x in (type, type(__builtins__))) + # Check if attr itself is a type (class) + is_type_or_builtin = isinstance(attr, type) if is_type_or_builtin: yield from traverse(attr, module, new_parts) continue is_callable = ( callable(attr) - or not issubclass(attr_typ, type) + or not isinstance(attr, type) or attr_typ.__name__ in ("getset_descriptor", "member_descriptor") )This avoids the
__builtins__type inconsistency and usesisinstance(attr, type)instead ofissubclass, which won't raiseTypeError.
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
crates/rustpython_doc_db/generate.py(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: Applies to {vm,stdlib}/**/*.rs : Use RustPython macros (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust
📚 Learning: 2025-09-11T05:06:59.971Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: Applies to Lib/**/*.py : Minimize modifications to CPython standard library files; only make minimal changes to work around RustPython limitations
Applied to files:
crates/rustpython_doc_db/generate.py
🧬 Code graph analysis (1)
crates/rustpython_doc_db/generate.py (2)
Lib/importlib/_bootstrap_external.py (1)
ExtensionFileLoader(1267-1317)Lib/warnings.py (1)
catch_warnings(442-515)
🪛 Flake8 (7.3.0)
crates/rustpython_doc_db/generate.py
[error] 37-37: SyntaxError: invalid syntax
(E999)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Check the WASM package and demo
- GitHub Check: Run snippets and cpython tests (windows-latest)
- GitHub Check: Run snippets and cpython tests (macos-latest)
- GitHub Check: Run snippets and cpython tests (ubuntu-latest)
- GitHub Check: Run rust tests (windows-latest)
- GitHub Check: Run rust tests (ubuntu-latest)
- GitHub Check: Check Rust code with rustfmt and clippy
- GitHub Check: Run rust tests (macos-latest)
- GitHub Check: Ensure compilation on various targets
🔇 Additional comments (2)
crates/rustpython_doc_db/generate.py (2)
48-52: Good addition of assertion to document precondition.The
assert self.raw_doc is not Noneon line 50 clearly documents that this property should only be called whenraw_docis not None. Sincemain()filters entries (lines 206-207) before accessing.doc, this precondition is satisfied in practice.
203-211: LGTM! Clean filtering and output logic.The filtering on line 207 (
if entry.raw_doc is not None) correctly ensures that only entries with documentation are included, and the subsequent.docproperty access is safe. The JSON output formatting is appropriate.
| } | ||
|
|
||
|
|
||
| type Parts = tuple[str, ...] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify Python 3.12+ requirement or use backwards-compatible syntax.
The type statement for type aliases is Python 3.12+ syntax (PEP 695). If this script needs to support earlier Python versions, use the backwards-compatible form.
Run this script to check the minimum Python version required by the project:
If Python 3.12+ is not guaranteed, apply this diff:
-type Parts = tuple[str, ...]
+Parts: typing.TypeAlias = tuple[str, ...]🏁 Script executed:
#!/bin/bash
# Check for Python version requirements in the repository
rg -n "python_requires|requires-python|python_version" -g "*.toml" -g "*.cfg" -g "*.ini" -g "pyproject.toml" -g "setup.py" -g "setup.cfg"Length of output: 89
Let me check for more specific version requirements for the doc_db module and project configuration files.
🏁 Script executed:
#!/bin/bash
# Search for pyproject.toml files and version specifications
find . -name "pyproject.toml" -o -name "Cargo.toml" | head -20 | xargs grep -l "rustpython_doc_db\|python.*=" 2>/dev/null | head -10Length of output: 313
🏁 Script executed:
#!/bin/bash
# Look for the Cargo.toml in the doc_db crate specifically
fd -n "Cargo.toml" | xargs grep -A5 "rustpython_doc_db" 2>/dev/nullLength of output: 230
🏁 Script executed:
#!/bin/bash
# Locate the doc_db crate and check its configuration
find . -path "*/rustpython_doc_db/Cargo.toml" -o -path "*/rustpython_doc_db/pyproject.toml" | head -10Length of output: 37
🏁 Script executed:
#!/bin/bash
# Check the actual Cargo.toml for doc_db crate
cat crates/rustpython_doc_db/Cargo.toml 2>/dev/null || echo "File not found"Length of output: 306
🏁 Script executed:
#!/bin/bash
# Look for Python interpreter version requirement in the generate.py file itself
head -5 crates/rustpython_doc_db/generate.pyLength of output: 74
🏁 Script executed:
#!/bin/bash
# Check if there's a shebang or version comment indicating Python requirements
rg -n "python|Python|version" crates/rustpython_doc_db/generate.py | head -20Length of output: 112
🏁 Script executed:
#!/bin/bash
# Check project root pyproject.toml for Python version requirements
cat pyproject.toml 2>/dev/null | head -50Length of output: 0
🏁 Script executed:
#!/bin/bash
# Check if there's any documentation about tools Python version
find . -name "README*" -o -name "CONTRIBUTING*" | xargs grep -i "python.*version\|version.*python" 2>/dev/null | head -10Length of output: 172
🏁 Script executed:
#!/bin/bash
# Check for CI/build files that might specify Python version for tools
find . -path ".github/workflows/*.yml" -o -path ".gitlab-ci.yml" -o -path ".circleci/config.yml" -o -path "*.md" | xargs grep -l "generate.py\|python.*3\." 2>/dev/null | head -10Length of output: 19
🏁 Script executed:
#!/bin/bash
# Check for any scripts directory or Makefile that might call generate.py
find . -name "Makefile" -o -name "*.sh" | xargs grep -l "generate.py" 2>/dev/nullLength of output: 0
🏁 Script executed:
#!/bin/bash
# Check if there's any Python version pinning in requirements or similar files
find . -name "requirements*.txt" -o -name ".python-version" -o -name "tox.ini" | head -10Length of output: 60
🏁 Script executed:
#!/bin/bash
# Search for any mention of generate.py in documentation
find . -name "*.md" | xargs grep -B2 -A2 "generate.py" 2>/dev/null | head -30Length of output: 0
Use backwards-compatible type alias syntax.
Line 37 uses the Python 3.12+ type statement (PEP 695), which causes a SyntaxError on Python 3.11 or earlier. Since no explicit Python 3.12+ requirement is documented in the repository, use the backwards-compatible form:
-type Parts = tuple[str, ...]
+Parts: typing.TypeAlias = tuple[str, ...]Also add import typing to the imports if not already present.
🧰 Tools
🪛 Flake8 (7.3.0)
[error] 37-37: SyntaxError: invalid syntax
(E999)
🤖 Prompt for AI Agents
In crates/rustpython_doc_db/generate.py around line 37, the PEP 695 `type Parts
= tuple[str, ...]` syntax is not compatible with Python <3.12; replace it with a
backwards-compatible alias (e.g., use typing.Tuple: Parts = typing.Tuple[str,
...]) and ensure there is an import typing at the top of the file (add it if
missing).
| for module in iter_c_modules() | ||
| for doc_entry in traverse(module, module) | ||
| ) | ||
| yield from (doc_entry for doc_entry in traverse(__builtins__, __builtins__)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify __builtins__ handling in different contexts.
__builtins__ can be a dict (in __main__) or a module (elsewhere), which may cause issues when passed to traverse() as both object and module. The is_child_of check on line 141 relies on inspect.getmodule(obj), which returns None for non-module types.
Run this script to check how __builtins__ is used and whether it causes issues:
🏁 Script executed:
#!/bin/bash
# Test the script with different Python contexts to see if __builtins__ handling works
python -c "
import types
print('__builtins__ type:', type(__builtins__))
print('Is module:', isinstance(__builtins__, types.ModuleType))
"
# Also check if the script runs without errors
cd crates/rustpython_doc_db
python generate.py 2>&1 | head -20Length of output: 201
I need to examine the actual implementation of the traverse and is_child_of functions to determine if the __builtins__ dict case is properly handled.
🏁 Script executed:
#!/bin/bash
# Find the traverse function and is_child_of function implementations
rg -nP -A 15 'def traverse\s*\(' crates/rustpython_doc_db/Length of output: 1119
🏁 Script executed:
#!/bin/bash
# Find the is_child_of function
rg -nP -A 10 'def is_child_of\s*\(' crates/rustpython_doc_db/Length of output: 656
🏁 Script executed:
#!/bin/bash
# Also examine the find_doc_entries function around line 178
sed -n '170,205p' crates/rustpython_doc_db/generate.pyLength of output: 1029
Fix __builtins__ handling to work in __main__ context.
The original concern is correct: __builtins__ is a dict when the script runs in __main__, causing is_child_of() to fail. Lines 178, 179, and 199 pass __builtins__ to traverse(), which expects a types.ModuleType. When is_child_of() executes inspect.getmodule(obj) is module, comparing a module/None against a dict always returns False, breaking documentation collection.
Recommended fix: Import the actual builtins module and use it instead of __builtins__:
import builtins
# Replace __builtins__ with builtins in lines 178, 179, and 199🤖 Prompt for AI Agents
In crates/rustpython_doc_db/generate.py around lines 178, 179 and 199, passing
__builtins__ (a dict when run as __main__) into traverse() breaks is_child_of()
which expects a module; import the standard builtins module and replace
occurrences of __builtins__ at those lines with the builtins module so traverse
receives a types.ModuleType instead of a dict.
youknowone
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much!
@coolreader18 you have to review crate naming policy not to block these changes
crates/rustpython_doc_db/src/lib.rs
Outdated
| #[cfg(any(target_os = "linux", target_os = "android"))] | ||
| include!("./linux.inc.rs"); | ||
|
|
||
| #[cfg(any(target_os = "macos", target_os = "ios"))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| #[cfg(any(target_os = "macos", target_os = "ios"))] | |
| #[cfg(target_vendor = "apple")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is outdated. now everything is merged into a single file
Cargo.toml
Outdated
| rustpython-sre_engine = { path = "vm/sre_engine", version = "0.4.0" } | ||
| rustpython-wtf8 = { path = "wtf8", version = "0.4.0" } | ||
| rustpython-doc = { git = "https://github.com/RustPython/__doc__", tag = "0.3.0", version = "0.3.0" } | ||
| rustpython_doc_db = { path = "crates/rustpython_doc_db", version = "0.4.0" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| rustpython_doc_db = { path = "crates/rustpython_doc_db", version = "0.4.0" } | |
| rustpython-doc_db = { path = "crates/rustpython_doc_db", version = "0.4.0" } |
- is a namespace split, _ is a (sort of) whitespace split
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is worth discussing more. I haven't seen a single project that has both - and _ in it's name. I think we should stick to one separator and use it (I'm in favor of _)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have that names for every crates.
I don't think mixing rustpython- and rustpython_ for different crates gives any value.
Actually I was missing the fundamental question about this change.
We'd better to keep using the previous crate name rustpython-doc. Moving crate doesn't mean renaming(creating a new) crate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have that names for every crates. I don't think mixing
rustpython-andrustpython_for different crates gives any value.
think we should choose one sperator and stick to it, as long as it's consistent.
Actually I was missing the fundamental question about this change. We'd better to keep using the previous crate name
rustpython-doc. Moving crate doesn't mean renaming(creating a new) crate.
Yeah, ofc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think we should choose one sperator and stick to it, as long as it's consistent.
I understand that there’s some validity to that point.
However, it’s not fair to justify it simply by calling it “consistent,” because the fact that all RustPython crates share the rustpython- prefix is already a form of consistency in itself.
It’s just a different kind of consistency.
If these two forms of consistency conflict, I would reject introducing a new crate with a different prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
crates/rustpython-doc/generate.py (3)
37-37: Critical: Use backwards-compatible type alias syntax.The PEP 695
typestatement requires Python 3.12+. Since no Python 3.12+ requirement is documented, use the backwards-compatible form to ensure the script works on earlier Python versions.Apply this diff:
-type Parts = tuple[str, ...] +Parts: typing.TypeAlias = tuple[str, ...]
178-201: Critical: Fix__builtins__handling for__main__context.When this script runs as
__main__,__builtins__is adict, not a module. Lines 178 and 200 pass__builtins__totraverse(), which expects atypes.ModuleTypeas the second parameter. This causesis_child_of()to fail sinceinspect.getmodule(obj) is modulewill compare a module/None against a dict.Apply this diff:
+import builtins + def find_doc_entries() -> "Iterable[DocEntry]": yield from ( doc_entry for module in iter_c_modules() for doc_entry in traverse(module, module) ) - yield from (doc_entry for doc_entry in traverse(__builtins__, __builtins__)) + yield from (doc_entry for doc_entry in traverse(builtins, builtins)) builtin_types = [ type(None), type(bytearray().__iter__()), ... ] for typ in builtin_types: parts = ("builtins", typ.__name__) yield DocEntry(parts, pydoc._getowndoc(typ)) - yield from traverse(typ, __builtins__, parts) + yield from traverse(typ, builtins, parts)
150-161: Guard against TypeError in type checking logic.Line 159's
issubclass(attr_typ, type)can raiseTypeErrorifattr_typis not a class (e.g., special descriptors, generic aliases). Thecallable(attr)check doesn't fully protect against this.Additionally, line 151's
type(__builtins__)is fragile since__builtins__can be a dict in some contexts.Apply this diff to make the checks more robust:
attr_typ = type(attr) -is_type_or_builtin = any(attr_typ is x for x in (type, type(__builtins__))) +# Check if attr itself is a type/class +is_type_or_builtin = isinstance(attr, type) if is_type_or_builtin: yield from traverse(attr, module, new_parts) continue is_callable = ( callable(attr) - or not issubclass(attr_typ, type) + or not (inspect.isclass(attr_typ) and issubclass(attr_typ, type)) or attr_typ.__name__ in ("getset_descriptor", "member_descriptor") )
🧹 Nitpick comments (1)
crates/rustpython-doc/generate.py (1)
48-52: Consider a more defensive approach for the doc property.The
assertat line 50 assumesraw_docis notNone, which is currently safe becausemain()(line 207) filters out entries withNoneraw_doc before calling.doc. However, adding a more descriptive error or returning an empty string would make this more defensive.Optional improvement:
@property def doc(self) -> str: - assert self.raw_doc is not None - + if self.raw_doc is None: + return "" return re.sub(UNICODE_ESCAPE, r"\\u{\1}", inspect.cleandoc(self.raw_doc))
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
.github/workflows/update-doc-db.yml(1 hunks)Cargo.toml(3 hunks)crates/rustpython-doc/.gitignore(1 hunks)crates/rustpython-doc/Cargo.toml(1 hunks)crates/rustpython-doc/LICENSE(1 hunks)crates/rustpython-doc/generate.py(1 hunks)crates/rustpython-doc/src/lib.rs(1 hunks)derive-impl/Cargo.toml(1 hunks)derive-impl/src/pyclass.rs(2 hunks)derive-impl/src/pymodule.rs(3 hunks)
✅ Files skipped from review due to trivial changes (3)
- crates/rustpython-doc/Cargo.toml
- crates/rustpython-doc/src/lib.rs
- crates/rustpython-doc/.gitignore
🚧 Files skipped from review as they are similar to previous changes (4)
- derive-impl/src/pymodule.rs
- derive-impl/src/pyclass.rs
- .github/workflows/update-doc-db.yml
- derive-impl/Cargo.toml
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-09-11T05:06:59.971Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: Applies to Lib/**/*.py : Minimize modifications to CPython standard library files; only make minimal changes to work around RustPython limitations
Applied to files:
crates/rustpython-doc/LICENSEcrates/rustpython-doc/generate.pyCargo.toml
📚 Learning: 2025-09-11T05:06:59.971Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: Applies to Lib/test/**/*.py : When modifying CPython tests, add a '# TODO: RUSTPYTHON' comment documenting the reason
Applied to files:
crates/rustpython-doc/generate.py
📚 Learning: 2025-09-11T05:06:59.971Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: Applies to {vm,stdlib}/**/*.rs : Use RustPython macros (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust
Applied to files:
Cargo.toml
📚 Learning: 2025-08-09T22:56:24.527Z
Learnt from: jackoconnordev
Repo: RustPython/RustPython PR: 6086
File: benches/microbenchmarks/sort.py:1-3
Timestamp: 2025-08-09T22:56:24.527Z
Learning: In RustPython's microbenchmarks (benches/microbenchmarks/*.py), the variable `ITERATIONS` is intentionally used without being defined in the Python files. It is injected by the cargo bench harness at runtime. This pattern should be maintained for consistency across all microbenchmarks, and F821 lint warnings for undefined `ITERATIONS` are expected and acceptable in this context.
Applied to files:
Cargo.toml
📚 Learning: 2025-09-11T05:06:59.971Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: In most cases, avoid editing Python code; prefer fixing bugs via Rust changes
Applied to files:
Cargo.toml
🧬 Code graph analysis (1)
crates/rustpython-doc/generate.py (2)
Lib/importlib/_bootstrap_external.py (1)
ExtensionFileLoader(1267-1317)Lib/warnings.py (1)
catch_warnings(442-515)
🪛 Flake8 (7.3.0)
crates/rustpython-doc/generate.py
[error] 37-37: SyntaxError: invalid syntax
(E999)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: auto_format
🔇 Additional comments (5)
crates/rustpython-doc/LICENSE (1)
1-277: LGTM: Standard Python license file.The Python Software Foundation license text is appropriate for a crate that handles Python standard library documentation.
Cargo.toml (4)
76-78: LGTM: Appropriate optimization for a data-heavy crate.Setting
codegen-units = 1for the doc database crate trades longer compile time for better optimization, which is reasonable for data that "doesn't change often."
139-139: LGTM: Standard workspace glob pattern.Adding
crates/*appropriately includes the new rustpython-doc crate and any future crates in that directory.
164-164: LGTM: Crate successfully migrated to workspace.The path-based dependency correctly integrates the formerly external rustpython-doc crate into the workspace. The naming convention (dash separator) is consistent with other RustPython crates.
171-171: LGTM: phf dependency for compile-time hash maps.The phf dependency configuration is appropriate for generating compile-time perfect hash maps, addressing the HashMap issues mentioned in the PR objectives when handling ~9,000 documentation entries.
youknowone
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
I will wait for @coolreader18 a few more days about the crate relocation issue.
|
@youknowone I ain't sure about this CI failure. I think it's related to #6237 |
__doc__ crate to crates/rustpython_doc_db__doc__ crate to crates/doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (4)
crates/doc/generate.py (4)
37-37: Critical: Use backwards-compatible type alias syntax.Line 37 uses Python 3.12+
typestatement syntax (PEP 695), which will cause aSyntaxErroron Python 3.11 or earlier. Since no explicit Python 3.12+ requirement is documented, use the backwards-compatible form.Apply this diff:
-type Parts = tuple[str, ...] +Parts: typing.TypeAlias = tuple[str, ...]
150-161: Fix type checking logic to avoid errors and false positives.Two issues in this type-checking code:
Line 151:
type(__builtins__)is unreliable because__builtins__can be adictin__main__context or a module elsewhere. Comparingattr_typtotype(__builtins__)will give inconsistent results.Line 159:
issubclass(attr_typ, type)can raiseTypeErrorifattr_typis not a class (e.g., if it's a descriptor or special object).Consider this approach:
- attr_typ = type(attr) - is_type_or_builtin = any(attr_typ is x for x in (type, type(__builtins__))) + # Check if attr itself is a type/class + is_type_or_builtin = isinstance(attr, type) if is_type_or_builtin: yield from traverse(attr, module, new_parts) continue - is_callable = ( - callable(attr) - or not issubclass(attr_typ, type) - or attr_typ.__name__ in ("getset_descriptor", "member_descriptor") - ) + attr_typ = type(attr) + try: + is_subclass_of_type = issubclass(attr_typ, type) + except TypeError: + is_subclass_of_type = False + + is_callable = ( + callable(attr) + or not is_subclass_of_type + or attr_typ.__name__ in ("getset_descriptor", "member_descriptor") + )
178-178: Critical: Import and use the builtins module instead of builtins.Using
__builtins__directly is problematic because it's adictwhen the script runs as__main__but a module in other contexts. When it's adict,traverse()andis_child_of()will fail because they expecttypes.ModuleType.Apply this fix:
+import builtins + def find_doc_entries() -> "Iterable[DocEntry]": yield from ( doc_entry for module in iter_c_modules() for doc_entry in traverse(module, module) ) - yield from (doc_entry for doc_entry in traverse(__builtins__, __builtins__)) + yield from (doc_entry for doc_entry in traverse(builtins, builtins))Also update line 200 to use
builtinsinstead of__builtins__.
197-200: Apply the same builtins fix here.Line 200 also uses
__builtins__and needs the same fix as line 178. After importingbuiltinsat the top:for typ in builtin_types: parts = ("builtins", typ.__name__) yield DocEntry(parts, pydoc._getowndoc(typ)) - yield from traverse(typ, __builtins__, parts) + yield from traverse(typ, builtins, parts)
🧹 Nitpick comments (2)
crates/doc/Cargo.toml (1)
13-14: Clarify the doctest crash and consider alternatives.The comment "Crashes when true" is vague and doesn't explain the root cause. Please:
- Document the specific error message or panic that occurs
- Consider whether this is a fundamental limitation or a solvable issue
- If it's due to the included generated data, add a more descriptive comment
Understanding the underlying issue will help future maintainers decide whether this is a permanent workaround or something that can be fixed.
crates/doc/generate.py (1)
48-52: Consider handling None more gracefully in the doc property.The
assertat line 50 will prevent crashes whenraw_docisNone, but assertions can be disabled with Python's-Oflag. Sincemain()already filters forraw_doc is not Noneat line 207, this shouldn't trigger in practice. However, for robustness, consider either:
- Returning an empty string when
raw_doc is None- Changing the return type to
str | Noneand returningNoneThis makes the API more defensive and self-documenting.
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
Cargo.toml(3 hunks)crates/doc/.gitignore(1 hunks)crates/doc/Cargo.toml(1 hunks)crates/doc/LICENSE(1 hunks)crates/doc/generate.py(1 hunks)crates/doc/src/lib.rs(1 hunks)
✅ Files skipped from review due to trivial changes (2)
- crates/doc/.gitignore
- crates/doc/LICENSE
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-09T22:56:24.527Z
Learnt from: jackoconnordev
Repo: RustPython/RustPython PR: 6086
File: benches/microbenchmarks/sort.py:1-3
Timestamp: 2025-08-09T22:56:24.527Z
Learning: In RustPython's microbenchmarks (benches/microbenchmarks/*.py), the variable `ITERATIONS` is intentionally used without being defined in the Python files. It is injected by the cargo bench harness at runtime. This pattern should be maintained for consistency across all microbenchmarks, and F821 lint warnings for undefined `ITERATIONS` are expected and acceptable in this context.
Applied to files:
Cargo.toml
🪛 Flake8 (7.3.0)
crates/doc/generate.py
[error] 37-37: SyntaxError: invalid syntax
(E999)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Check the WASM package and demo
- GitHub Check: Run tests under miri
- GitHub Check: Run snippets and cpython tests (ubuntu-latest)
- GitHub Check: Run snippets and cpython tests (windows-latest)
- GitHub Check: Run snippets and cpython tests (macos-latest)
- GitHub Check: Run rust tests (windows-latest)
- GitHub Check: Check Rust code with clippy
- GitHub Check: Run rust tests (macos-latest)
- GitHub Check: Run rust tests (ubuntu-latest)
- GitHub Check: Ensure compilation on various targets
- GitHub Check: auto_format
🔇 Additional comments (10)
Cargo.toml (4)
76-78: LGTM! Appropriate optimization for a data-heavy crate.Setting
codegen-units = 1for the rustpython-doc package enables maximum optimization, which is appropriate given that this crate contains static documentation data that changes infrequently.
139-139: LGTM! Clean workspace organization.Using the
"crates/*"glob pattern is a good practice that simplifies adding new crates to the workspace.
164-164: Verify the crate path matches the PR objectives.The PR title mentions moving to
crates/rustpython_doc_db, but the actual path here iscrates/doc. Please confirm this is intentional, as the discrepancy could cause confusion for future maintainers.
171-171: phf 0.13.1 is confirmed as the latest stable version.The dependency configuration is correct and up-to-date. The explicit
default-features = falsewith the"macros"feature is an appropriate choice for this use case.crates/doc/Cargo.toml (1)
1-9: LGTM! Standard crate configuration.The package metadata properly inherits workspace values and is well-configured.
crates/doc/src/lib.rs (2)
1-1: Verify the build process generates data.inc.rs.The included file
data.inc.rsis not present in the PR (likely gitignored). Please ensure that:
- The build process (or a pre-build step) generates this file
- Documentation explains how to regenerate it
- CI validates that the generated file is up-to-date
3-12: LGTM! Adequate smoke test for the generated documentation database.The test validates that the generated
DBcontains expected entries, which is appropriate for ensuring the data generation pipeline works correctly.crates/doc/generate.py (3)
1-20: LGTM! Clean script setup with appropriate OS-specific output paths.The imports and output file configuration are well-structured.
55-83: LGTM! Well-implemented C extension detection.The function uses multiple fallback checks and is well-documented.
203-214: LGTM! Clean data generation and output.The main function properly filters entries and produces well-formatted JSON output.
youknowone
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! And thank you so much for your patience
Moved https://github.com/RustPython/__doc__.git to here, where it's going to be easier to maintain.
I did do some modifications to the original crate, one of them is to not use
once_cell::sync::Lazy, that's because it's redundant (I think) as it's accessed via the vm & stdlib at compile time, so it's never going to be "lazily evaluated".Second change I did was to use https://github.com/rust-phf/rust-phf because the rust's HashMap started to choke when inserting ~9K entries
Summary by CodeRabbit
New Features
Chores
Documentation
Tests