Skip to content

Conversation

@ShaharNaveh
Copy link
Collaborator

@ShaharNaveh ShaharNaveh commented Nov 3, 2025

Moved https://github.com/RustPython/__doc__.git to here, where it's going to be easier to maintain.

I did do some modifications to the original crate, one of them is to not use once_cell::sync::Lazy , that's because it's redundant (I think) as it's accessed via the vm & stdlib at compile time, so it's never going to be "lazily evaluated".

Second change I did was to use https://github.com/rust-phf/rust-phf because the rust's HashMap started to choke when inserting ~9K entries

Summary by CodeRabbit

  • New Features

    • Centralized documentation database enabling consistent in-app doc lookups.
    • Automated documentation generation workflow to keep standard library and builtins docs up to date.
  • Chores

    • Workspace reorganization and dependency adjustments to support the doc crate.
    • CI timeout increases for improved test stability.
  • Documentation

    • Added comprehensive license and ignore rules for generated documentation.
  • Tests

    • Added a basic retrieval test to validate the doc database.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 3, 2025

Walkthrough

This PR adds a new workspace crate crates/doc that generates a static documentation DB (phf map) from Python stdlib/builtins, wires that DB into derive-impl to read docs at compile-time, introduces a GitHub Actions workflow to build the DB from multiple platforms, and updates workspace/Cargo and ancillary metadata.

Changes

Cohort / File(s) Summary
Repository metadata
\.gitattributes``
Mark crates/rustpython_doc/src/*.inc.rs as linguist-generated.
GitHub Actions workflow
\.github/workflows/update-doc-db.yml``
New workflow "Update doc DB": generate per-platform JSON docs, merge them, format into src/data.inc.rs containing a phf::Map DB, and upload artifact.
CI adjustments
\.github/workflows/ci.yaml``
Increased Windows job timeouts from 30 to 35 minutes for two jobs.
Workspace root manifest
\Cargo.toml``
Add crates/* workspace members, add phf workspace dependency, switch rustpython-doc from git to local crates/doc path (v0.4.0), and add a release profile for rustpython-doc.
New doc crate - config & license
\crates/doc/Cargo.toml`, `crates/doc/LICENSE`, `crates/doc/.gitignore``
Add new rustpython-doc crate metadata, include LICENSE text, and ignore generated/ directory.
New doc crate - implementation
\crates/doc/src/lib.rs`, `crates/doc/generate.py`, `crates/doc/README?* (generated artifact)``
Add library that includes generated data.inc.rs and a test; add generate.py script that extracts docs from Python stdlib/builtins and emits platform-specific JSON for workflow merging.
derive-impl build config
\derive-impl/Cargo.toml``
Fix malformed line (workspace = true).
derive-impl code updates
\derive-impl/src/lib.rs`, `derive-impl/src/pyclass.rs`, `derive-impl/src/pymodule.rs``
Remove previous internal doc import and switch doc lookups to rustpython_doc::DB (e.g., DB.get("module.name")) for class/module/function docs; added use rustpython_doc::DB.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant GH as GitHub Actions
    participant Gen as Generate Job (multi-OS)
    participant Merge as Merge Job
    participant Artifact as src/data.inc.rs
    participant Build as Cargo Build / crate compile
    participant Macro as derive-impl runtime (compile-time include)

    GH->>Gen: workflow_dispatch / push triggers
    Gen->>Gen: run crates/doc/generate.py per OS
    Gen-->>GH: upload platform JSON artifacts
    GH->>Merge: start after Gen completes
    Merge->>Merge: download artifacts & merge -> merged.json
    Merge->>Merge: format merged -> phf map source
    Merge->>Artifact: create src/data.inc.rs (DB)
    Merge-->>GH: upload Rust source artifact
    Build->>Artifact: include!("./data.inc.rs") at compile-time
    Macro->>Macro: derive-impl calls DB.get("module.name")
    Macro-->>Build: doc string included in generated code
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Areas needing extra attention:
    • crates/doc/generate.py — module traversal and doc extraction correctness, unicode escaping.
    • .github/workflows/update-doc-db.yml — artifact upload/download, JSON merge and phf formatting.
    • derive-impl/src/pyclass.rs and derive-impl/src/pymodule.rs — verify DB key formats and null-handling.
    • Cargo.toml workspace dependency/path changes — ensure workspace resolution and phf feature set.

Possibly related PRs

Suggested reviewers

  • youknowone

Poem

🐰 With twitching nose and bounding hop,

I gathered docs from bottom to top.
A phf-made chest of Python lore,
Now stitched inside the crate's core.
Hop, compile, and fetch with glee — hooray for doc delivery! 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: moving the doc crate from an external repository into crates/doc within the repository.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4bee896 and 9e5ed29.

📒 Files selected for processing (1)
  • .github/workflows/ci.yaml (2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-07-11T09:35:43.571Z
Learnt from: ShaharNaveh
Repo: RustPython/RustPython PR: 5947
File: .github/workflows/ci.yaml:116-116
Timestamp: 2025-07-11T09:35:43.571Z
Learning: GitHub Actions timeout-minutes field supports expressions that evaluate to integers, such as `${{ contains(matrix.os, 'windows') && 40 || 30 }}`, which will conditionally set different timeout values based on the runner OS.

Applied to files:

  • .github/workflows/ci.yaml
📚 Learning: 2025-07-11T09:35:43.571Z
Learnt from: ShaharNaveh
Repo: RustPython/RustPython PR: 5947
File: .github/workflows/ci.yaml:116-116
Timestamp: 2025-07-11T09:35:43.571Z
Learning: GitHub Actions supports ternary-like expressions using `&&` and `||` operators in timeout-minutes. The syntax `${{ condition && value_if_true || value_if_false }}` is valid and commonly used, such as `${{ contains(matrix.os, 'windows') && 40 || 30 }}` which evaluates to 40 for Windows runners and 30 for others.

Applied to files:

  • .github/workflows/ci.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Run tests under miri
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: auto_format
🔇 Additional comments (1)
.github/workflows/ci.yaml (1)

116-116: Timeout increases are well-justified given PR context.

The conditional timeout expressions correctly identify Windows runners (45 minutes) vs others (35 minutes), accommodating slower Windows builds. This aligns with the PR's introduction of crates/doc, new phf dependency, and workspace-wide cargo changes that increase compile time. The expression syntax using && and || is valid and safe since both timeout values are non-falsy integers.

Consider monitoring whether these timeout values prove sufficient during CI runs, especially given the addition of ~9,000 documentation entries being processed via phf. If timeouts are still exceeded, you may need to further increase the values or optimize the build process.

Also applies to: 242-242


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ShaharNaveh ShaharNaveh marked this pull request as ready for review November 7, 2025 15:42
if-no-files-found: error
retention-days: 7
overwrite: true
pr:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ain't sure if we want to open a PR, or just push the generated artifacts

Copy link
Member

@youknowone youknowone Nov 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to update it only when target Python version is changed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so probably not worth the effort in automating a PR

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
crates/rustpython_doc_db/src/lib.rs (1)

10-19: Consider expanding test coverage.

The single test validates basic DB functionality, but additional tests would strengthen confidence in the generated database. Consider adding tests for:

  • Multiple module paths (builtins, stdlib modules)
  • Non-existent keys returning None
  • Documentation content format validation
crates/rustpython_doc_db/generate.py (1)

36-36: Replace Python 3.12+ type statement with backwards-compatible syntax.

The type statement for type aliases was introduced in Python 3.12. While the workflow specifies Python 3.13.9, using a more compatible syntax ensures the script works with older Python versions.

Apply this diff:

-type Parts = tuple[str, ...]
+Parts = tuple[str, ...]

Alternatively, if type hints are important, use typing.TypeAlias:

+from typing import TypeAlias
+
-type Parts = tuple[str, ...]
+Parts: TypeAlias = tuple[str, ...]
.github/workflows/update-doc-db.yml (1)

56-56: Intentionally disabled PR job with confusing condition.

The condition if: false && ${{ inputs.open_pr }} is always false due to the leading false &&. While this effectively disables the job, it's confusing. Consider using a clearer approach:

-  if: false && ${{ inputs.open_pr }}
+  if: false  # TODO: Re-enable when PR automation is ready

Or remove the job entirely until it's ready to be used.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5cad66c and 83ab568.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (11)
  • .gitattributes (1 hunks)
  • .github/workflows/update-doc-db.yml (1 hunks)
  • Cargo.toml (3 hunks)
  • crates/rustpython_doc_db/Cargo.toml (1 hunks)
  • crates/rustpython_doc_db/LICENSE (1 hunks)
  • crates/rustpython_doc_db/generate.py (1 hunks)
  • crates/rustpython_doc_db/src/lib.rs (1 hunks)
  • derive-impl/Cargo.toml (2 hunks)
  • derive-impl/src/lib.rs (0 hunks)
  • derive-impl/src/pyclass.rs (2 hunks)
  • derive-impl/src/pymodule.rs (3 hunks)
💤 Files with no reviewable changes (1)
  • derive-impl/src/lib.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Format Rust code with the default rustfmt style (run cargo fmt)
Run clippy and fix any warnings or lints introduced by your changes
Follow Rust best practices for error handling and memory management

Files:

  • derive-impl/src/pyclass.rs
  • derive-impl/src/pymodule.rs
  • crates/rustpython_doc_db/src/lib.rs
🧠 Learnings (9)
📓 Common learnings
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: Applies to Lib/**/*.py : Minimize modifications to CPython standard library files; only make minimal changes to work around RustPython limitations
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: Applies to {vm,stdlib}/**/*.rs : Use RustPython macros (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust
📚 Learning: 2025-09-11T05:06:59.971Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: Applies to {vm,stdlib}/**/*.rs : Use RustPython macros (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust

Applied to files:

  • derive-impl/src/pyclass.rs
  • derive-impl/src/pymodule.rs
  • crates/rustpython_doc_db/src/lib.rs
  • crates/rustpython_doc_db/Cargo.toml
  • Cargo.toml
  • crates/rustpython_doc_db/generate.py
  • derive-impl/Cargo.toml
  • .gitattributes
📚 Learning: 2025-09-11T05:06:59.971Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: Applies to Lib/**/*.py : Minimize modifications to CPython standard library files; only make minimal changes to work around RustPython limitations

Applied to files:

  • derive-impl/src/pyclass.rs
  • derive-impl/src/pymodule.rs
  • crates/rustpython_doc_db/src/lib.rs
  • crates/rustpython_doc_db/Cargo.toml
  • Cargo.toml
  • crates/rustpython_doc_db/generate.py
  • derive-impl/Cargo.toml
  • .gitattributes
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.

Applied to files:

  • crates/rustpython_doc_db/src/lib.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.

Applied to files:

  • crates/rustpython_doc_db/src/lib.rs
📚 Learning: 2025-08-09T22:56:24.527Z
Learnt from: jackoconnordev
Repo: RustPython/RustPython PR: 6086
File: benches/microbenchmarks/sort.py:1-3
Timestamp: 2025-08-09T22:56:24.527Z
Learning: In RustPython's microbenchmarks (benches/microbenchmarks/*.py), the variable `ITERATIONS` is intentionally used without being defined in the Python files. It is injected by the cargo bench harness at runtime. This pattern should be maintained for consistency across all microbenchmarks, and F821 lint warnings for undefined `ITERATIONS` are expected and acceptable in this context.

Applied to files:

  • Cargo.toml
📚 Learning: 2025-09-11T05:06:59.971Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: Applies to **/*.rs : Run clippy and fix any warnings or lints introduced by your changes

Applied to files:

  • derive-impl/Cargo.toml
  • .gitattributes
📚 Learning: 2025-09-11T05:06:59.971Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: Applies to **/*.rs : Format Rust code with the default rustfmt style (run `cargo fmt`)

Applied to files:

  • .gitattributes
📚 Learning: 2025-09-11T05:06:59.971Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: Applies to **/*.rs : Follow Rust best practices for error handling and memory management

Applied to files:

  • .gitattributes
🧬 Code graph analysis (1)
crates/rustpython_doc_db/generate.py (3)
Lib/platform.py (2)
  • platform (1193-1265)
  • python_version (1125-1133)
Lib/importlib/_bootstrap_external.py (1)
  • ExtensionFileLoader (1267-1317)
Lib/warnings.py (1)
  • catch_warnings (442-515)
🪛 actionlint (1.7.8)
.github/workflows/update-doc-db.yml

56-56: if: condition "false && ${{ inputs.open_pr }}" is always evaluated to true because extra characters are around ${{ }}

(if-cond)

🪛 Flake8 (7.3.0)
crates/rustpython_doc_db/generate.py

[error] 36-36: SyntaxError: invalid syntax

(E999)

🔇 Additional comments (19)
.gitattributes (1)

8-8: LGTM!

Correctly marks the generated PHF map files as linguist-generated, which is appropriate for code-generated artifacts.

crates/rustpython_doc_db/LICENSE (1)

1-277: LGTM!

The Python Software Foundation License is appropriate for a crate containing documentation derived from Python's standard library.

crates/rustpython_doc_db/src/lib.rs (1)

1-8: LGTM!

Platform-specific conditional compilation is correctly structured. The approach ensures the appropriate documentation database is included for each target OS.

Cargo.toml (3)

76-78: LGTM!

Setting codegen-units = 1 for the doc database in release builds is a good optimization. Since this crate changes infrequently and contains a large static PHF map, maximizing optimization at the cost of longer compile times is appropriate.


139-139: LGTM!

Using the wildcard pattern crates/* for workspace members is cleaner than listing individual crates and allows for easier future additions.


164-164: LGTM!

The rustpython_doc_db and phf dependencies are correctly configured. PHF 0.13.1 with macros feature is appropriate for the perfect hash function implementation.

Also applies to: 171-171

crates/rustpython_doc_db/generate.py (3)

44-49: Verify UNICODE_ESCAPE regex pattern behavior.

The regex replacement at line 45 appears to convert \uNNNN to \u{NNNN} format, but the pattern r"\\u([0-9]+)" will match any sequence of digits (not limited to 4). This could cause issues if the input contains sequences like \u123456.

Consider whether the pattern should be more specific:

-UNICODE_ESCAPE = re.compile(r"\\u([0-9]+)")
+UNICODE_ESCAPE = re.compile(r"\\u([0-9a-fA-F]{4})")

Or if extended unicode is intended:

-UNICODE_ESCAPE = re.compile(r"\\u([0-9]+)")
+UNICODE_ESCAPE = re.compile(r"\\u([0-9a-fA-F]+)")

93-109: LGTM!

Properly suppresses DeprecationWarning during module imports and handles import failures gracefully with warnings. This is appropriate for scanning stdlib modules where some may be deprecated or platform-specific.


200-226: LGTM!

The main function correctly generates a Rust PHF map with proper formatting, headers, and platform-specific output paths. The sorted output ensures deterministic generation.

.github/workflows/update-doc-db.yml (2)

31-53: LGTM!

The generation job correctly runs on all major platforms (Windows, Linux, macOS), executes the Python generator script, detects output changes, and uploads artifacts with appropriate retention settings.


19-19: The review comment is incorrect—Python 3.13.9 is valid.

Python 3.13.9 is the latest Python 3.13 release, released October 14, 2025. The specified version in the workflow is correct and no changes are needed.

Likely an incorrect or invalid review comment.

derive-impl/src/pyclass.rs (2)

9-9: LGTM!

Correctly imports the centralized documentation database.


317-322: LGTM!

The migration to the centralized documentation database is correctly implemented. The DB.get(&format!(...)).copied().map(str::to_owned) pattern properly handles the lookup and ownership transfer.

crates/rustpython_doc_db/Cargo.toml (1)

13-14: Remove or update the unnecessary doctest disable.

The crate contains no documentation comments or doctests—the doctest = false setting serves no purpose. The comment "Crashes when true" is unsubstantiated. Either remove the setting or replace the comment with accurate context if there's a specific reason for keeping it disabled.

Likely an incorrect or invalid review comment.

derive-impl/Cargo.toml (2)

14-14: LGTM!

The dependency migration from rustpython-doc to rustpython_doc_db correctly reflects the PR's objective to move the documentation database into the monorepo.


26-26: LGTM!

The lint configuration is now correctly inheriting from the workspace.

derive-impl/src/pymodule.rs (3)

9-9: LGTM!

The import of DB from rustpython_doc_db correctly provides access to the static documentation database.


104-104: LGTM!

The migration to the static DB lookup is correctly implemented. The pattern DB.get(module_name).copied().map(str::to_owned) appropriately converts the PHF map's Option<&&'static str> to Option<String>, and the fallback chain ensures code-level documentation takes precedence.


461-466: LGTM!

The item-level documentation lookup correctly migrates to the DB-based approach. The key format "{module}.{py_name}" follows Python's standard naming convention, and the fallback logic preserves the precedence of code-level documentation.

The format! allocation on each lookup is a minor cost during proc-macro expansion, which is acceptable given this code runs at compile time rather than runtime.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 83ab568 and 6140114.

📒 Files selected for processing (1)
  • .github/workflows/update-doc-db.yml (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: Applies to Lib/**/*.py : Minimize modifications to CPython standard library files; only make minimal changes to work around RustPython limitations
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: Applies to {vm,stdlib}/**/*.rs : Use RustPython macros (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust
📚 Learning: 2025-07-10T10:08:43.330Z
Learnt from: ShaharNaveh
Repo: RustPython/RustPython PR: 5932
File: .github/workflows/comment-commands.yml:18-24
Timestamp: 2025-07-10T10:08:43.330Z
Learning: In GitHub Actions workflows for the RustPython project, the maintainer ShaharNaveh prefers to keep workflows simple and doesn't mind if steps fail when the desired state is already achieved (e.g., user already assigned to an issue). Avoid suggesting complex error handling for edge cases they don't consider problematic.

Applied to files:

  • .github/workflows/update-doc-db.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run tests under miri
  • GitHub Check: Check Rust code with rustfmt and clippy
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Check the WASM package and demo
🔇 Additional comments (1)
.github/workflows/update-doc-db.yml (1)

31-32: Sparse-checkout is sufficient for generate.py.

Verification confirms that generate.py is completely self-contained within crates/rustpython_doc_db and does not require access to files outside this directory. The script:

  • Anchors all operations to CRATE_DIR (the script's parent directory)
  • Writes output exclusively to CRATE_DIR / "src" / *.inc.rs
  • Contains no file read operations, config file loading, or external data dependencies
  • Uses only standard library imports with no workspace or sibling crate references

The workflow's sparse-checkout configuration is correct and sufficient.

Comment on lines 38 to 50
- name: Generate (${{ matrix.os }})
run: python crates/rustpython_doc_db/generate.py

- name: Detect output file
id: diff
run: echo "diff=$(git diff --name-only)" >> $GITHUB_OUTPUT

- uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
with:
name: doc-db-${{ env.PYTHON_VERSION }}-${{ matrix.os }}
path: ${{ steps.diff.outputs.diff }}
if-no-files-found: error
retention-days: 7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Major: Workflow fails if generate script produces no changes.

The workflow detects output via git diff --name-only (line 43) and uploads artifacts with if-no-files-found: error (line 49). If the generated output is already up-to-date (no changes), the job fails, which contradicts the use case where this workflow may be idempotent.

Additionally, line 39 runs the generate script with no explicit error handling—if it fails, the workflow continues to the artifact upload step.

Consider:

  1. For idempotent behavior: Use if-no-files-found: warn instead of error to allow the workflow to complete gracefully when no changes are detected.
  2. For explicit error handling: Add set -e or explicit checks after the generate step.

Apply this diff:

      - name: Generate (${{ matrix.os }})
        run: python crates/rustpython_doc_db/generate.py

      - name: Detect output file
        id: diff
        run: echo "diff=$(git diff --name-only)" >> $GITHUB_OUTPUT

      - uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
        with:
          name: doc-db-${{ env.PYTHON_VERSION }}-${{ matrix.os }}
          path: ${{ steps.diff.outputs.diff }}
-         if-no-files-found: error
+         if-no-files-found: warn

Based on learnings, the maintainers prefer simplicity and accept workflows that may fail gracefully in edge cases; however, the intent here is unclear.

🤖 Prompt for AI Agents
.github/workflows/update-doc-db.yml lines 38-50: the workflow will fail when the
generate script produces no git-diff (upload-artifact uses if-no-files-found:
error) and also doesn't stop early if the generate script itself fails; change
the upload step to use if-no-files-found: warn to make idempotent runs succeed
with no changes, and make the generate step fail fast by running the script
under a strict shell (e.g., add set -e before running the python command or
check the script exit status and exit non-zero on failure) so the job correctly
surfaces generation errors.

@youknowone
Copy link
Member

Looks great, I have one more question. Is it possible to merge 3 inc files into a single file? They have subsets but can be merged into a single big file. Then we don't need to worry about missing architecture when building

@ShaharNaveh
Copy link
Collaborator Author

Looks great, I have one more question. Is it possible to merge 3 inc files into a single file? They have subsets but can be merged into a single big file. Then we don't need to worry about missing architecture when building

Yeah sure, we can do that.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
crates/rustpython_doc_db/src/lib.rs (1)

7-11: Consider testing DB existence more generally.

The test currently hardcodes a specific key ("array._array_reconstructor"). If the generated data changes or this entry is removed, the test will break. Consider either:

  • Testing that the DB is not empty: assert!(!DB.is_empty());
  • Testing multiple well-known entries to be more robust
     #[test]
     fn test_db() {
-        let doc = DB.get("array._array_reconstructor");
-        assert!(doc.is_some());
+        // Verify DB contains documentation entries
+        assert!(!DB.is_empty(), "DB should contain documentation entries");
+        
+        // Verify at least one known entry exists
+        assert!(
+            DB.get("builtins.dict").is_some() || DB.get("builtins.list").is_some(),
+            "DB should contain common builtin types"
+        );
     }
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7a432ec and 25cf146.

📒 Files selected for processing (4)
  • .github/workflows/update-doc-db.yml (1 hunks)
  • crates/rustpython_doc_db/.gitignore (1 hunks)
  • crates/rustpython_doc_db/generate.py (1 hunks)
  • crates/rustpython_doc_db/src/lib.rs (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • crates/rustpython_doc_db/.gitignore
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/update-doc-db.yml
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Format Rust code with the default rustfmt style (run cargo fmt)
Run clippy and fix any warnings or lints introduced by your changes
Follow Rust best practices for error handling and memory management

Files:

  • crates/rustpython_doc_db/src/lib.rs
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: Applies to {vm,stdlib}/**/*.rs : Use RustPython macros (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust
📚 Learning: 2025-09-11T05:06:59.971Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: Applies to Lib/**/*.py : Minimize modifications to CPython standard library files; only make minimal changes to work around RustPython limitations

Applied to files:

  • crates/rustpython_doc_db/src/lib.rs
  • crates/rustpython_doc_db/generate.py
📚 Learning: 2025-09-11T05:06:59.971Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: Applies to {vm,stdlib}/**/*.rs : Use RustPython macros (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust

Applied to files:

  • crates/rustpython_doc_db/src/lib.rs
🧬 Code graph analysis (1)
crates/rustpython_doc_db/generate.py (2)
Lib/importlib/_bootstrap_external.py (1)
  • ExtensionFileLoader (1267-1317)
Lib/warnings.py (1)
  • catch_warnings (442-515)
🪛 Flake8 (7.3.0)
crates/rustpython_doc_db/generate.py

[error] 37-37: SyntaxError: invalid syntax

(E999)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Check Rust code with rustfmt and clippy
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Check the WASM package and demo
🔇 Additional comments (6)
crates/rustpython_doc_db/src/lib.rs (1)

1-1: LGTM: Appropriate use of include! for generated code.

The include! macro is the standard approach for incorporating generated code in Rust.

crates/rustpython_doc_db/generate.py (5)

1-34: LGTM: Imports and constants are well-structured.

Good use of pathlib and proper handling of the output directory creation.


53-92: LGTM: Robust C extension detection logic.

The function uses multiple fallback approaches to reliably detect C extensions.


94-121: LGTM: Proper warning handling and module iteration.

Good use of warnings.catch_warnings() to suppress deprecation warnings while still reporting import failures.


176-176: Verify builtins usage across different contexts.

Using __builtins__ directly can be fragile as it's a dict in __main__ and a module in other contexts. While traverse() handles both cases, it's worth verifying this works correctly.

Consider using the builtins module explicitly:

+import builtins
+
 def find_doc_entires() -> "Iterable[DocEntry]":
     yield from (
         doc_entry
         for module in iter_c_modules()
         for doc_entry in traverse(module, module)
     )
-    yield from (doc_entry for doc_entry in traverse(__builtins__, __builtins__))
+    yield from (doc_entry for doc_entry in traverse(builtins, builtins))
 
     builtin_types = [
         # ...
     ]
     for typ in builtin_types:
         parts = ("builtins", typ.__name__)
         yield DocEntry(parts, pydoc._getowndoc(typ))
-        yield from traverse(typ, __builtins__, parts)
+        yield from traverse(typ, builtins, parts)

37-37: ****

The type statement syntax is appropriate for this codebase. RustPython explicitly targets Python 3.13.0+ (as stated in README.md: "A Python-3 (CPython >= 3.13.0) Interpreter written in Rust"), and the Python 3.12+ type syntax (PEP 695) is consistent with multiple other Python files in the project (scripts/lib_updater.py, Lib/typing.py, Lib/test/typinganndata/mod_generics_cache.py, and Lib/test/test_symtable.py). No compatibility issue exists.

Likely an incorrect or invalid review comment.

Comment on lines 48 to 50
@property
def doc(self) -> str:
return re.sub(UNICODE_ESCAPE, r"\\u{\1}", inspect.cleandoc(self.raw_doc))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle None case in doc property.

The doc property calls inspect.cleandoc(self.raw_doc) but raw_doc can be None (as declared in line 42). This will raise an error if raw_doc is None.

Apply this diff to handle the None case:

     @property
     def doc(self) -> str:
-        return re.sub(UNICODE_ESCAPE, r"\\u{\1}", inspect.cleandoc(self.raw_doc))
+        if self.raw_doc is None:
+            return ""
+        return re.sub(UNICODE_ESCAPE, r"\\u{\1}", inspect.cleandoc(self.raw_doc))

Or change the return type to match the optional nature:

     @property
-    def doc(self) -> str:
+    def doc(self) -> str | None:
+        if self.raw_doc is None:
+            return None
         return re.sub(UNICODE_ESCAPE, r"\\u{\1}", inspect.cleandoc(self.raw_doc))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@property
def doc(self) -> str:
return re.sub(UNICODE_ESCAPE, r"\\u{\1}", inspect.cleandoc(self.raw_doc))
@property
def doc(self) -> str:
if self.raw_doc is None:
return ""
return re.sub(UNICODE_ESCAPE, r"\\u{\1}", inspect.cleandoc(self.raw_doc))
Suggested change
@property
def doc(self) -> str:
return re.sub(UNICODE_ESCAPE, r"\\u{\1}", inspect.cleandoc(self.raw_doc))
@property
def doc(self) -> str | None:
if self.raw_doc is None:
return None
return re.sub(UNICODE_ESCAPE, r"\\u{\1}", inspect.cleandoc(self.raw_doc))
🤖 Prompt for AI Agents
In crates/rustpython_doc_db/generate.py around lines 48 to 50, the doc property
calls inspect.cleandoc(self.raw_doc) but raw_doc can be None, which will raise;
update the property to handle the None case by checking if self.raw_doc is None
and returning an appropriate value (e.g., empty string) before calling
inspect.cleandoc and re.sub, or alternatively change the property return type to
Optional[str] and return None when raw_doc is None; ensure the branch avoids
calling inspect.cleandoc with None and keep the rest of the processing
unchanged.

Comment on lines +148 to +159
attr_typ = type(attr)
is_type_or_builtin = any(attr_typ is x for x in (type, type(__builtins__)))

if is_type_or_builtin:
yield from traverse(attr, module, new_parts)
continue

is_callable = (
callable(attr)
or not issubclass(attr_typ, type)
or attr_typ.__name__ in ("getset_descriptor", "member_descriptor")
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential edge cases in type checking logic.

Two potential issues in this type-checking code:

  1. Line 149: type(__builtins__) may not work consistently. __builtins__ can be a dict in some contexts (e.g., __main__) and a module in others.

  2. Line 157: issubclass(attr_typ, type) can raise TypeError if attr_typ is not a class (e.g., if it's a special descriptor or non-type object).

Consider these improvements:

         attr_typ = type(attr)
-        is_type_or_builtin = any(attr_typ is x for x in (type, type(__builtins__)))
+        # Handle __builtins__ which can be dict or module
+        builtins_type = type(__builtins__) if isinstance(__builtins__, types.ModuleType) else types.ModuleType
+        is_type_or_builtin = any(attr_typ is x for x in (type, builtins_type))
 
         if is_type_or_builtin:
             yield from traverse(attr, module, new_parts)
             continue
 
         is_callable = (
             callable(attr)
-            or not issubclass(attr_typ, type)
+            or not (inspect.isclass(attr_typ) and issubclass(attr_typ, type))
             or attr_typ.__name__ in ("getset_descriptor", "member_descriptor")
         )

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In crates/rustpython_doc_db/generate.py around lines 148 to 159, the current
checks use type(__builtins__) and issubclass(attr_typ, type) which can break
when __builtins__ is a dict or attr_typ is not a class; change the logic to use
isinstance(attr, type) to detect real types instead of comparing attr_typ to
type, guard the builtin check by testing whether __builtins__ is a module (e.g.
isinstance(__builtins__, types.ModuleType)) before using its type, and replace
the raw issubclass call with a try/except TypeError (or use isinstance(attr,
type) where appropriate) so non-class attr_typs don’t raise — in short, use
isinstance checks for types, guard builtin-module handling, and catch TypeError
around issubclass to safely fall back to False.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
crates/rustpython_doc_db/generate.py (1)

150-161: Guard against type-checking edge cases.

Two potential runtime errors remain from the previous review:

  1. Line 151: type(__builtins__) may not be consistent—__builtins__ can be a dict in __main__ or a module elsewhere.
  2. Line 159: issubclass(attr_typ, type) raises TypeError if attr_typ is not a class (e.g., a descriptor or special object).

Consider this safer approach:

-    attr_typ = type(attr)
-    is_type_or_builtin = any(attr_typ is x for x in (type, type(__builtins__)))
+    # Check if attr itself is a type (class)
+    is_type_or_builtin = isinstance(attr, type)
 
     if is_type_or_builtin:
         yield from traverse(attr, module, new_parts)
         continue
 
     is_callable = (
         callable(attr)
-        or not issubclass(attr_typ, type)
+        or not isinstance(attr, type)
         or attr_typ.__name__ in ("getset_descriptor", "member_descriptor")
     )

This avoids the __builtins__ type inconsistency and uses isinstance(attr, type) instead of issubclass, which won't raise TypeError.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 25cf146 and 4e21b6f.

📒 Files selected for processing (1)
  • crates/rustpython_doc_db/generate.py (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: Applies to {vm,stdlib}/**/*.rs : Use RustPython macros (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust
📚 Learning: 2025-09-11T05:06:59.971Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: Applies to Lib/**/*.py : Minimize modifications to CPython standard library files; only make minimal changes to work around RustPython limitations

Applied to files:

  • crates/rustpython_doc_db/generate.py
🧬 Code graph analysis (1)
crates/rustpython_doc_db/generate.py (2)
Lib/importlib/_bootstrap_external.py (1)
  • ExtensionFileLoader (1267-1317)
Lib/warnings.py (1)
  • catch_warnings (442-515)
🪛 Flake8 (7.3.0)
crates/rustpython_doc_db/generate.py

[error] 37-37: SyntaxError: invalid syntax

(E999)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Check Rust code with rustfmt and clippy
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Ensure compilation on various targets
🔇 Additional comments (2)
crates/rustpython_doc_db/generate.py (2)

48-52: Good addition of assertion to document precondition.

The assert self.raw_doc is not None on line 50 clearly documents that this property should only be called when raw_doc is not None. Since main() filters entries (lines 206-207) before accessing .doc, this precondition is satisfied in practice.


203-211: LGTM! Clean filtering and output logic.

The filtering on line 207 (if entry.raw_doc is not None) correctly ensures that only entries with documentation are included, and the subsequent .doc property access is safe. The JSON output formatting is appropriate.

}


type Parts = tuple[str, ...]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify Python 3.12+ requirement or use backwards-compatible syntax.

The type statement for type aliases is Python 3.12+ syntax (PEP 695). If this script needs to support earlier Python versions, use the backwards-compatible form.

Run this script to check the minimum Python version required by the project:

If Python 3.12+ is not guaranteed, apply this diff:

-type Parts = tuple[str, ...]
+Parts: typing.TypeAlias = tuple[str, ...]

🏁 Script executed:

#!/bin/bash
# Check for Python version requirements in the repository
rg -n "python_requires|requires-python|python_version" -g "*.toml" -g "*.cfg" -g "*.ini" -g "pyproject.toml" -g "setup.py" -g "setup.cfg"

Length of output: 89


Let me check for more specific version requirements for the doc_db module and project configuration files.


🏁 Script executed:

#!/bin/bash
# Search for pyproject.toml files and version specifications
find . -name "pyproject.toml" -o -name "Cargo.toml" | head -20 | xargs grep -l "rustpython_doc_db\|python.*=" 2>/dev/null | head -10

Length of output: 313


🏁 Script executed:

#!/bin/bash
# Look for the Cargo.toml in the doc_db crate specifically
fd -n "Cargo.toml" | xargs grep -A5 "rustpython_doc_db" 2>/dev/null

Length of output: 230


🏁 Script executed:

#!/bin/bash
# Locate the doc_db crate and check its configuration
find . -path "*/rustpython_doc_db/Cargo.toml" -o -path "*/rustpython_doc_db/pyproject.toml" | head -10

Length of output: 37


🏁 Script executed:

#!/bin/bash
# Check the actual Cargo.toml for doc_db crate
cat crates/rustpython_doc_db/Cargo.toml 2>/dev/null || echo "File not found"

Length of output: 306


🏁 Script executed:

#!/bin/bash
# Look for Python interpreter version requirement in the generate.py file itself
head -5 crates/rustpython_doc_db/generate.py

Length of output: 74


🏁 Script executed:

#!/bin/bash
# Check if there's a shebang or version comment indicating Python requirements
rg -n "python|Python|version" crates/rustpython_doc_db/generate.py | head -20

Length of output: 112


🏁 Script executed:

#!/bin/bash
# Check project root pyproject.toml for Python version requirements
cat pyproject.toml 2>/dev/null | head -50

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check if there's any documentation about tools Python version
find . -name "README*" -o -name "CONTRIBUTING*" | xargs grep -i "python.*version\|version.*python" 2>/dev/null | head -10

Length of output: 172


🏁 Script executed:

#!/bin/bash
# Check for CI/build files that might specify Python version for tools
find . -path ".github/workflows/*.yml" -o -path ".gitlab-ci.yml" -o -path ".circleci/config.yml" -o -path "*.md" | xargs grep -l "generate.py\|python.*3\." 2>/dev/null | head -10

Length of output: 19


🏁 Script executed:

#!/bin/bash
# Check for any scripts directory or Makefile that might call generate.py
find . -name "Makefile" -o -name "*.sh" | xargs grep -l "generate.py" 2>/dev/null

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check if there's any Python version pinning in requirements or similar files
find . -name "requirements*.txt" -o -name ".python-version" -o -name "tox.ini" | head -10

Length of output: 60


🏁 Script executed:

#!/bin/bash
# Search for any mention of generate.py in documentation
find . -name "*.md" | xargs grep -B2 -A2 "generate.py" 2>/dev/null | head -30

Length of output: 0


Use backwards-compatible type alias syntax.

Line 37 uses the Python 3.12+ type statement (PEP 695), which causes a SyntaxError on Python 3.11 or earlier. Since no explicit Python 3.12+ requirement is documented in the repository, use the backwards-compatible form:

-type Parts = tuple[str, ...]
+Parts: typing.TypeAlias = tuple[str, ...]

Also add import typing to the imports if not already present.

🧰 Tools
🪛 Flake8 (7.3.0)

[error] 37-37: SyntaxError: invalid syntax

(E999)

🤖 Prompt for AI Agents
In crates/rustpython_doc_db/generate.py around line 37, the PEP 695 `type Parts
= tuple[str, ...]` syntax is not compatible with Python <3.12; replace it with a
backwards-compatible alias (e.g., use typing.Tuple: Parts = typing.Tuple[str,
...]) and ensure there is an import typing at the top of the file (add it if
missing).

for module in iter_c_modules()
for doc_entry in traverse(module, module)
)
yield from (doc_entry for doc_entry in traverse(__builtins__, __builtins__))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify __builtins__ handling in different contexts.

__builtins__ can be a dict (in __main__) or a module (elsewhere), which may cause issues when passed to traverse() as both object and module. The is_child_of check on line 141 relies on inspect.getmodule(obj), which returns None for non-module types.

Run this script to check how __builtins__ is used and whether it causes issues:


🏁 Script executed:

#!/bin/bash
# Test the script with different Python contexts to see if __builtins__ handling works
python -c "
import types
print('__builtins__ type:', type(__builtins__))
print('Is module:', isinstance(__builtins__, types.ModuleType))
"

# Also check if the script runs without errors
cd crates/rustpython_doc_db
python generate.py 2>&1 | head -20

Length of output: 201


I need to examine the actual implementation of the traverse and is_child_of functions to determine if the __builtins__ dict case is properly handled.



🏁 Script executed:

#!/bin/bash
# Find the traverse function and is_child_of function implementations
rg -nP -A 15 'def traverse\s*\(' crates/rustpython_doc_db/

Length of output: 1119


🏁 Script executed:

#!/bin/bash
# Find the is_child_of function
rg -nP -A 10 'def is_child_of\s*\(' crates/rustpython_doc_db/

Length of output: 656


🏁 Script executed:

#!/bin/bash
# Also examine the find_doc_entries function around line 178
sed -n '170,205p' crates/rustpython_doc_db/generate.py

Length of output: 1029


Fix __builtins__ handling to work in __main__ context.

The original concern is correct: __builtins__ is a dict when the script runs in __main__, causing is_child_of() to fail. Lines 178, 179, and 199 pass __builtins__ to traverse(), which expects a types.ModuleType. When is_child_of() executes inspect.getmodule(obj) is module, comparing a module/None against a dict always returns False, breaking documentation collection.

Recommended fix: Import the actual builtins module and use it instead of __builtins__:

import builtins
# Replace __builtins__ with builtins in lines 178, 179, and 199
🤖 Prompt for AI Agents
In crates/rustpython_doc_db/generate.py around lines 178, 179 and 199, passing
__builtins__ (a dict when run as __main__) into traverse() breaks is_child_of()
which expects a module; import the standard builtins module and replace
occurrences of __builtins__ at those lines with the builtins module so traverse
receives a types.ModuleType instead of a dict.

Copy link
Member

@youknowone youknowone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much!

@coolreader18 you have to review crate naming policy not to block these changes

#[cfg(any(target_os = "linux", target_os = "android"))]
include!("./linux.inc.rs");

#[cfg(any(target_os = "macos", target_os = "ios"))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[cfg(any(target_os = "macos", target_os = "ios"))]
#[cfg(target_vendor = "apple")]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is outdated. now everything is merged into a single file

Cargo.toml Outdated
rustpython-sre_engine = { path = "vm/sre_engine", version = "0.4.0" }
rustpython-wtf8 = { path = "wtf8", version = "0.4.0" }
rustpython-doc = { git = "https://github.com/RustPython/__doc__", tag = "0.3.0", version = "0.3.0" }
rustpython_doc_db = { path = "crates/rustpython_doc_db", version = "0.4.0" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
rustpython_doc_db = { path = "crates/rustpython_doc_db", version = "0.4.0" }
rustpython-doc_db = { path = "crates/rustpython_doc_db", version = "0.4.0" }

- is a namespace split, _ is a (sort of) whitespace split

Copy link
Collaborator Author

@ShaharNaveh ShaharNaveh Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is worth discussing more. I haven't seen a single project that has both - and _ in it's name. I think we should stick to one separator and use it (I'm in favor of _)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have that names for every crates.
I don't think mixing rustpython- and rustpython_ for different crates gives any value.

Actually I was missing the fundamental question about this change.
We'd better to keep using the previous crate name rustpython-doc. Moving crate doesn't mean renaming(creating a new) crate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have that names for every crates. I don't think mixing rustpython- and rustpython_ for different crates gives any value.

think we should choose one sperator and stick to it, as long as it's consistent.

Actually I was missing the fundamental question about this change. We'd better to keep using the previous crate name rustpython-doc. Moving crate doesn't mean renaming(creating a new) crate.

Yeah, ofc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think we should choose one sperator and stick to it, as long as it's consistent.

I understand that there’s some validity to that point.
However, it’s not fair to justify it simply by calling it “consistent,” because the fact that all RustPython crates share the rustpython- prefix is already a form of consistency in itself.
It’s just a different kind of consistency.
If these two forms of consistency conflict, I would reject introducing a new crate with a different prefix.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
crates/rustpython-doc/generate.py (3)

37-37: Critical: Use backwards-compatible type alias syntax.

The PEP 695 type statement requires Python 3.12+. Since no Python 3.12+ requirement is documented, use the backwards-compatible form to ensure the script works on earlier Python versions.

Apply this diff:

-type Parts = tuple[str, ...]
+Parts: typing.TypeAlias = tuple[str, ...]

178-201: Critical: Fix __builtins__ handling for __main__ context.

When this script runs as __main__, __builtins__ is a dict, not a module. Lines 178 and 200 pass __builtins__ to traverse(), which expects a types.ModuleType as the second parameter. This causes is_child_of() to fail since inspect.getmodule(obj) is module will compare a module/None against a dict.

Apply this diff:

+import builtins
+
 def find_doc_entries() -> "Iterable[DocEntry]":
     yield from (
         doc_entry
         for module in iter_c_modules()
         for doc_entry in traverse(module, module)
     )
-    yield from (doc_entry for doc_entry in traverse(__builtins__, __builtins__))
+    yield from (doc_entry for doc_entry in traverse(builtins, builtins))

     builtin_types = [
         type(None),
         type(bytearray().__iter__()),
         ...
     ]
     for typ in builtin_types:
         parts = ("builtins", typ.__name__)
         yield DocEntry(parts, pydoc._getowndoc(typ))
-        yield from traverse(typ, __builtins__, parts)
+        yield from traverse(typ, builtins, parts)

150-161: Guard against TypeError in type checking logic.

Line 159's issubclass(attr_typ, type) can raise TypeError if attr_typ is not a class (e.g., special descriptors, generic aliases). The callable(attr) check doesn't fully protect against this.

Additionally, line 151's type(__builtins__) is fragile since __builtins__ can be a dict in some contexts.

Apply this diff to make the checks more robust:

 attr_typ = type(attr)
-is_type_or_builtin = any(attr_typ is x for x in (type, type(__builtins__)))
+# Check if attr itself is a type/class
+is_type_or_builtin = isinstance(attr, type)

 if is_type_or_builtin:
     yield from traverse(attr, module, new_parts)
     continue

 is_callable = (
     callable(attr)
-    or not issubclass(attr_typ, type)
+    or not (inspect.isclass(attr_typ) and issubclass(attr_typ, type))
     or attr_typ.__name__ in ("getset_descriptor", "member_descriptor")
 )
🧹 Nitpick comments (1)
crates/rustpython-doc/generate.py (1)

48-52: Consider a more defensive approach for the doc property.

The assert at line 50 assumes raw_doc is not None, which is currently safe because main() (line 207) filters out entries with None raw_doc before calling .doc. However, adding a more descriptive error or returning an empty string would make this more defensive.

Optional improvement:

 @property
 def doc(self) -> str:
-    assert self.raw_doc is not None
-
+    if self.raw_doc is None:
+        return ""
     return re.sub(UNICODE_ESCAPE, r"\\u{\1}", inspect.cleandoc(self.raw_doc))
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4e21b6f and 4544be3.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (10)
  • .github/workflows/update-doc-db.yml (1 hunks)
  • Cargo.toml (3 hunks)
  • crates/rustpython-doc/.gitignore (1 hunks)
  • crates/rustpython-doc/Cargo.toml (1 hunks)
  • crates/rustpython-doc/LICENSE (1 hunks)
  • crates/rustpython-doc/generate.py (1 hunks)
  • crates/rustpython-doc/src/lib.rs (1 hunks)
  • derive-impl/Cargo.toml (1 hunks)
  • derive-impl/src/pyclass.rs (2 hunks)
  • derive-impl/src/pymodule.rs (3 hunks)
✅ Files skipped from review due to trivial changes (3)
  • crates/rustpython-doc/Cargo.toml
  • crates/rustpython-doc/src/lib.rs
  • crates/rustpython-doc/.gitignore
🚧 Files skipped from review as they are similar to previous changes (4)
  • derive-impl/src/pymodule.rs
  • derive-impl/src/pyclass.rs
  • .github/workflows/update-doc-db.yml
  • derive-impl/Cargo.toml
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-09-11T05:06:59.971Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: Applies to Lib/**/*.py : Minimize modifications to CPython standard library files; only make minimal changes to work around RustPython limitations

Applied to files:

  • crates/rustpython-doc/LICENSE
  • crates/rustpython-doc/generate.py
  • Cargo.toml
📚 Learning: 2025-09-11T05:06:59.971Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: Applies to Lib/test/**/*.py : When modifying CPython tests, add a '# TODO: RUSTPYTHON' comment documenting the reason

Applied to files:

  • crates/rustpython-doc/generate.py
📚 Learning: 2025-09-11T05:06:59.971Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: Applies to {vm,stdlib}/**/*.rs : Use RustPython macros (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust

Applied to files:

  • Cargo.toml
📚 Learning: 2025-08-09T22:56:24.527Z
Learnt from: jackoconnordev
Repo: RustPython/RustPython PR: 6086
File: benches/microbenchmarks/sort.py:1-3
Timestamp: 2025-08-09T22:56:24.527Z
Learning: In RustPython's microbenchmarks (benches/microbenchmarks/*.py), the variable `ITERATIONS` is intentionally used without being defined in the Python files. It is injected by the cargo bench harness at runtime. This pattern should be maintained for consistency across all microbenchmarks, and F821 lint warnings for undefined `ITERATIONS` are expected and acceptable in this context.

Applied to files:

  • Cargo.toml
📚 Learning: 2025-09-11T05:06:59.971Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-09-11T05:06:59.971Z
Learning: In most cases, avoid editing Python code; prefer fixing bugs via Rust changes

Applied to files:

  • Cargo.toml
🧬 Code graph analysis (1)
crates/rustpython-doc/generate.py (2)
Lib/importlib/_bootstrap_external.py (1)
  • ExtensionFileLoader (1267-1317)
Lib/warnings.py (1)
  • catch_warnings (442-515)
🪛 Flake8 (7.3.0)
crates/rustpython-doc/generate.py

[error] 37-37: SyntaxError: invalid syntax

(E999)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: auto_format
🔇 Additional comments (5)
crates/rustpython-doc/LICENSE (1)

1-277: LGTM: Standard Python license file.

The Python Software Foundation license text is appropriate for a crate that handles Python standard library documentation.

Cargo.toml (4)

76-78: LGTM: Appropriate optimization for a data-heavy crate.

Setting codegen-units = 1 for the doc database crate trades longer compile time for better optimization, which is reasonable for data that "doesn't change often."


139-139: LGTM: Standard workspace glob pattern.

Adding crates/* appropriately includes the new rustpython-doc crate and any future crates in that directory.


164-164: LGTM: Crate successfully migrated to workspace.

The path-based dependency correctly integrates the formerly external rustpython-doc crate into the workspace. The naming convention (dash separator) is consistent with other RustPython crates.


171-171: LGTM: phf dependency for compile-time hash maps.

The phf dependency configuration is appropriate for generating compile-time perfect hash maps, addressing the HashMap issues mentioned in the PR objectives when handling ~9,000 documentation entries.

Copy link
Member

@youknowone youknowone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

I will wait for @coolreader18 a few more days about the crate relocation issue.

@ShaharNaveh
Copy link
Collaborator Author

@youknowone I ain't sure about this CI failure. I think it's related to #6237

@ShaharNaveh ShaharNaveh changed the title Move __doc__ crate to crates/rustpython_doc_db Move __doc__ crate to crates/doc Nov 14, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (4)
crates/doc/generate.py (4)

37-37: Critical: Use backwards-compatible type alias syntax.

Line 37 uses Python 3.12+ type statement syntax (PEP 695), which will cause a SyntaxError on Python 3.11 or earlier. Since no explicit Python 3.12+ requirement is documented, use the backwards-compatible form.

Apply this diff:

-type Parts = tuple[str, ...]
+Parts: typing.TypeAlias = tuple[str, ...]

150-161: Fix type checking logic to avoid errors and false positives.

Two issues in this type-checking code:

  1. Line 151: type(__builtins__) is unreliable because __builtins__ can be a dict in __main__ context or a module elsewhere. Comparing attr_typ to type(__builtins__) will give inconsistent results.

  2. Line 159: issubclass(attr_typ, type) can raise TypeError if attr_typ is not a class (e.g., if it's a descriptor or special object).

Consider this approach:

-        attr_typ = type(attr)
-        is_type_or_builtin = any(attr_typ is x for x in (type, type(__builtins__)))
+        # Check if attr itself is a type/class
+        is_type_or_builtin = isinstance(attr, type)
 
         if is_type_or_builtin:
             yield from traverse(attr, module, new_parts)
             continue
 
-        is_callable = (
-            callable(attr)
-            or not issubclass(attr_typ, type)
-            or attr_typ.__name__ in ("getset_descriptor", "member_descriptor")
-        )
+        attr_typ = type(attr)
+        try:
+            is_subclass_of_type = issubclass(attr_typ, type)
+        except TypeError:
+            is_subclass_of_type = False
+        
+        is_callable = (
+            callable(attr)
+            or not is_subclass_of_type
+            or attr_typ.__name__ in ("getset_descriptor", "member_descriptor")
+        )

178-178: Critical: Import and use the builtins module instead of builtins.

Using __builtins__ directly is problematic because it's a dict when the script runs as __main__ but a module in other contexts. When it's a dict, traverse() and is_child_of() will fail because they expect types.ModuleType.

Apply this fix:

+import builtins
+
 def find_doc_entries() -> "Iterable[DocEntry]":
     yield from (
         doc_entry
         for module in iter_c_modules()
         for doc_entry in traverse(module, module)
     )
-    yield from (doc_entry for doc_entry in traverse(__builtins__, __builtins__))
+    yield from (doc_entry for doc_entry in traverse(builtins, builtins))

Also update line 200 to use builtins instead of __builtins__.


197-200: Apply the same builtins fix here.

Line 200 also uses __builtins__ and needs the same fix as line 178. After importing builtins at the top:

     for typ in builtin_types:
         parts = ("builtins", typ.__name__)
         yield DocEntry(parts, pydoc._getowndoc(typ))
-        yield from traverse(typ, __builtins__, parts)
+        yield from traverse(typ, builtins, parts)
🧹 Nitpick comments (2)
crates/doc/Cargo.toml (1)

13-14: Clarify the doctest crash and consider alternatives.

The comment "Crashes when true" is vague and doesn't explain the root cause. Please:

  1. Document the specific error message or panic that occurs
  2. Consider whether this is a fundamental limitation or a solvable issue
  3. If it's due to the included generated data, add a more descriptive comment

Understanding the underlying issue will help future maintainers decide whether this is a permanent workaround or something that can be fixed.

crates/doc/generate.py (1)

48-52: Consider handling None more gracefully in the doc property.

The assert at line 50 will prevent crashes when raw_doc is None, but assertions can be disabled with Python's -O flag. Since main() already filters for raw_doc is not None at line 207, this shouldn't trigger in practice. However, for robustness, consider either:

  1. Returning an empty string when raw_doc is None
  2. Changing the return type to str | None and returning None

This makes the API more defensive and self-documenting.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4544be3 and 4bee896.

📒 Files selected for processing (6)
  • Cargo.toml (3 hunks)
  • crates/doc/.gitignore (1 hunks)
  • crates/doc/Cargo.toml (1 hunks)
  • crates/doc/LICENSE (1 hunks)
  • crates/doc/generate.py (1 hunks)
  • crates/doc/src/lib.rs (1 hunks)
✅ Files skipped from review due to trivial changes (2)
  • crates/doc/.gitignore
  • crates/doc/LICENSE
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-09T22:56:24.527Z
Learnt from: jackoconnordev
Repo: RustPython/RustPython PR: 6086
File: benches/microbenchmarks/sort.py:1-3
Timestamp: 2025-08-09T22:56:24.527Z
Learning: In RustPython's microbenchmarks (benches/microbenchmarks/*.py), the variable `ITERATIONS` is intentionally used without being defined in the Python files. It is injected by the cargo bench harness at runtime. This pattern should be maintained for consistency across all microbenchmarks, and F821 lint warnings for undefined `ITERATIONS` are expected and acceptable in this context.

Applied to files:

  • Cargo.toml
🪛 Flake8 (7.3.0)
crates/doc/generate.py

[error] 37-37: SyntaxError: invalid syntax

(E999)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Run tests under miri
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: auto_format
🔇 Additional comments (10)
Cargo.toml (4)

76-78: LGTM! Appropriate optimization for a data-heavy crate.

Setting codegen-units = 1 for the rustpython-doc package enables maximum optimization, which is appropriate given that this crate contains static documentation data that changes infrequently.


139-139: LGTM! Clean workspace organization.

Using the "crates/*" glob pattern is a good practice that simplifies adding new crates to the workspace.


164-164: Verify the crate path matches the PR objectives.

The PR title mentions moving to crates/rustpython_doc_db, but the actual path here is crates/doc. Please confirm this is intentional, as the discrepancy could cause confusion for future maintainers.


171-171: phf 0.13.1 is confirmed as the latest stable version.

The dependency configuration is correct and up-to-date. The explicit default-features = false with the "macros" feature is an appropriate choice for this use case.

crates/doc/Cargo.toml (1)

1-9: LGTM! Standard crate configuration.

The package metadata properly inherits workspace values and is well-configured.

crates/doc/src/lib.rs (2)

1-1: Verify the build process generates data.inc.rs.

The included file data.inc.rs is not present in the PR (likely gitignored). Please ensure that:

  1. The build process (or a pre-build step) generates this file
  2. Documentation explains how to regenerate it
  3. CI validates that the generated file is up-to-date

3-12: LGTM! Adequate smoke test for the generated documentation database.

The test validates that the generated DB contains expected entries, which is appropriate for ensuring the data generation pipeline works correctly.

crates/doc/generate.py (3)

1-20: LGTM! Clean script setup with appropriate OS-specific output paths.

The imports and output file configuration are well-structured.


55-83: LGTM! Well-implemented C extension detection.

The function uses multiple fallback checks and is well-documented.


203-214: LGTM! Clean data generation and output.

The main function properly filters entries and produces well-formatted JSON output.

Copy link
Member

@youknowone youknowone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! And thank you so much for your patience

@youknowone youknowone merged commit db1adaa into RustPython:main Nov 14, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants