Add `kde` function and tests to RustPython statistics module #6030

jackoconnordev · 2025-07-24T16:40:03Z

Description

Adds the kde function to RustPython's standard library alongside Python unit tests. Function and tests were copied as is from CPython 3.13, but in order for the new tests to complete on the order of seconds instead of hours I adjusted some "magic constants".

From my local testing, the builtin sorted function is the bottleneck here. CPython will sort a list of 1 million random integers in a few hundred milliseconds making it suitable to be used in a unit test. RustPython is not yet capable of this.

Using `kde` in the interpreter

This example is taken from kde function's docstring.

>>>>> from statistics import kde
>>>>> sample = [-2.1, -1.3, -0.4, 1.9, 5.1, 6.2]
>>>>> f_hat = kde(sample, h=1.5)
>>>>> sum(f_hat(x) for x in range(-20, 20))
1.0
>>>>> # ^ sum of probabilities == 1
>>>>> 
>>>>> cdf = kde(sample, h=1.5, cumulative=True)
>>>>> round(cdf(7.5) - cdf(4.5), 2)
0.22
>>>>> # ^ probability of a value between 7.5 and 4.5 being drawn from the sample's estimated probability distribution

whats_left.py diff

Following is the result of diffing the output of python -I whats_left.py with and without the kde function added.

✦ ❯ diff with-kde.txt without-kde.txt 
1449a1450,1467
> statistics._SQRT2
> statistics.__annotations__
> statistics._decimal_sqrt_of_frac
> statistics._float_sqrt_of_frac
> statistics._integer_sqrt_of_frac_rto
> statistics._kernel_invcdfs
> statistics._mean_stdev
> statistics._newton_raphson
> statistics._quartic_invcdf
> statistics._quartic_invcdf_estimate
> statistics._rank
> statistics._sqrt_bit_width
> statistics._sqrtprod
> statistics._triweight_invcdf
> statistics._triweight_invcdf_estimate
> statistics.kde
> statistics.kde_random
> statistics.pi
1710c1728
< #  implemented 94
---
> #  implemented 93
1713,1715c1731,1733
< #  missing_items 115
< #  mismatched_items 67
< #  mismatched_doc_items 60
---
> #  missing_items 116
> #  mismatched_items 68
> #  mismatched_doc_items 61

Example output showing original numbers are too slow to be appropriate

# Example of running TestKDE without adjusting the numbers

✦ ❯ time cargo run --release -- -m test -v -u cpu test_statistics -m TestKDE
    Finished `release` profile [optimized] target(s) in 0.17s
     Running `target/release/rustpython -m test -v -u cpu test_statistics -m TestKDE`
== RustPython 3.13.0alpha (heads/main:96f47a415, Jul 23 2025, 04:28:38) [RustPython 0.4.0 with rustc 1.88.0 (6b00bc388 2025-06-23)]
== Linux-6.12.8-200.fc41.x86_64-x86_64-with-glibc2.35 little-endian
== cwd: /tmp/test_python_318016
== CPU count: 12
== encodings: locale=utf-8, FS=utf-8
Run tests sequentially
0:00:00 load avg: 1.28 [1/1] test_statistics
test_kde (test.test_statistics.TestKDE.test_kde) ... ok
test_kde_kernel_invcdfs (test.test_statistics.TestKDE.test_kde_kernel_invcdfs) ... ok
test_kde_random (test.test_statistics.TestKDE.test_kde_random) ... ^C
^Z
[2]+  Stopped                 cargo run --release -- -m test -v -u cpu test_statistics -m TestKDE

real	55m28.234s
user	0m0.000s
sys	0m0.007s

Summary by CodeRabbit

Bug Fixes
- Improved input validation for the normal distribution inverse CDF function, allowing more flexibility in parameter values.

coderabbitai · 2025-07-24T16:40:12Z

Walkthrough

The input validation in the normal_dist_inv_cdf function was updated to remove the explicit check for sigma <= 0.0. Now, the function only returns None if the probability p is not strictly between 0 and 1. All other logic in the function remains unchanged.

Changes

File(s)	Change Summary
stdlib/src/statistics.rs	Modified input validation in `normal_dist_inv_cdf` to remove the `sigma <= 0.0` check.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Poem

A bunny hopped through fields of code,
Where sigma's check was once bestowed.
Now lighter bounds on p remain,
Simpler logic, less to explain!
In statistics' gentle spring,
The rabbit smiles at what small tweaks can bring.

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

📜 Recent review details

Configuration used: .coderabbit.yml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 96f47a4 and 3ef5264.

⛔ Files ignored due to path filters (2)

Lib/statistics.py is excluded by !Lib/**
Lib/test/test_statistics.py is excluded by !Lib/**

📒 Files selected for processing (1)

stdlib/src/statistics.rs (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.rs

📄 CodeRabbit Inference Engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style (cargo fmt to format)
Always run clippy to lint code (cargo clippy) before completing tasks. Fix any warnings or lints that are introduced by your changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

stdlib/src/statistics.rs

🔇 Additional comments (1)

stdlib/src/statistics.rs (1)

10-12: LGTM: Improved CPython compatibility by removing early sigma validation.

Removing the sigma <= 0.0 check from the initial validation aligns with the CPython fix mentioned in the PR objectives. This change makes the function more permissive and mathematically consistent - the inverse CDF is primarily constrained by the probability p needing to be in the interval (0,1), while sigma validation can be handled at higher levels if needed.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

Lib/statistics.py

jackoconnordev · 2025-07-24T16:53:27Z

I don't know if the tests will pass with this. I still haven't fully figured out a fully working local development setup. I can build a rustpython binary which works as excpected, but cargo test --no-default-features --features stdlib,importlib,stdio,encodings,sqlite,ssl which I grabbed from the CI yaml file gives me the following output.

RUSTPYTHONPATH or PYTHONPATH is set, but it wasn't loaded to `Settings::path_list`. If you are going to customize the RustPython vm/interpreter, those environment variables are not loaded in the Settings struct by default. Please try creating a customized instance of the Settings struct. If you are developing the RustPython interpreter, it might be a bug during development.
If you don't have access to a consistent external environment (e.g. targeting wasm, embedding rustpython in another application), try enabling the `freeze-stdlib` feature.
If this is intended and you want to exclude the encodings module from your interpreter, please remove the `encodings` feature from `rustpython-vm` crate.

Any pointers would be appreciated if anyone spots this and is familiar with the message 🫡.

ShaharNaveh · 2025-07-24T17:04:05Z

Hey @jackoconnordev ty for the PR:)

Does:

cargo run --release -- -m test -j1 -v -u cpu test_statistics -m "TestKDE"

works for you? (assuming you're checked to your branch)

Also, it would be better if you could update both Lib/statistics.py & Lib/test/test_statistics.py from CPython 3.13, as we are trying to refrain from doing partial updates of code from CPython.

If any tests fails you can mark them with:

# TODO: RUSTPYTHON
@unittest.expectedFailure
def test_X(self):
  ...

jackoconnordev · 2025-07-24T17:42:49Z

Happy to and thanks for the quick review! I tried that test command now and got two AttributeErrors:

AttributeError: module 'statistics' has no attribute '_kernel_invcdfs'
AttributeError: module 'statistics' has no attribute 'kde_random'

I'll put this back into Draft and try porting the entire files from CPython 💪.

## test_kde I'm not too sure why but this one takes a few seconds to run the second for loop which calculates the cumulative distribution and does a rough calculation of the area under the curve. ## test_kde_random I have a lower bound for RustPython to sort a random list of 1_000_000 numbers on my laptop of > 1 hour. By dropping n to 30_000 sort will not take an egregious amount of time to run. It is then necessary to lower the tolerance for the math.isclose check, or the computed values may **randomly** fail due to the higher variance caused by the smaller sample size.

youknowone

Thank you for contributing!

@ShaharNaveh Thank you so much for the review!

Test is not run yet

youknowone

test_inv_cdf requires investigation. Adding back @unittest.expectedFailure will be enough for test_slots

…lots

jackoconnordev · 2025-07-25T09:32:10Z

Relevant CPython PR which introduced the failing assertion to test_inv_cdf. I will try to find the equivalent for the CPython C function _normal_dist_inv_cdf in RustPython and try make a similar change.

See python/cpython#95265. To quote: > Restores alignment with random.gauss(mu, sigma) and random.normalvariate(mu, sigma) both. of which are equivalent to sampling from NormalDist(mu, sigma).inv_cdf(random()). The two functions in the random module happy accept sigma=0 and give a well-defined result. > This also lets the function gently handle a sigma getting smaller, eventually becoming zero. As sigma decrease, NormalDist(mu, sigma).inv_cdf(p) forms a tighter and tighter internal around mu and becoming exactly mu in the limit. For example, NormalDist(100, 1E-300).inv_cdf(0.3) cleanly evaluates to 100.0but withsigma=1e-500`` the function previously would raised an unexpected error.

jackoconnordev · 2025-07-25T10:08:56Z

I've run the full test_statistics suite and it's passing on my laptop now cargo run --release -- -m test -v -u cpu test_statistics.

The pytest tests in extra_snippets are passing too. I added a hardcoded path to /Lib in impl Default for Settings and the cargo tests are passing locally too. Is that the normal solution for running cargo test?

youknowone · 2025-07-25T15:41:53Z

I usually don't run cargo test, but I expect it works without any local patch.
What was the problem? Fixing the problem will be better

youknowone

Thank you!

jackoconnordev commented Jul 24, 2025

View reviewed changes

Lib/statistics.py Outdated Show resolved Hide resolved

Lib/statistics.py Show resolved Hide resolved

jackoconnordev changed the title ~~[DRAFT] Add kde function and tests to RustPython statistics module~~ Add kde function and tests to RustPython statistics module Jul 24, 2025

jackoconnordev marked this pull request as ready for review July 24, 2025 16:47

jackoconnordev marked this pull request as draft July 24, 2025 17:43

jackoconnordev added 2 commits July 24, 2025 23:21

Copy CPython 3.13 statistics module into RustPython

2c820e3

jackoconnordev force-pushed the statistics-module-kde-function branch from d8ff7f0 to bdf71bd Compare July 24, 2025 22:21

jackoconnordev marked this pull request as ready for review July 24, 2025 22:36

youknowone approved these changes Jul 25, 2025

View reviewed changes

youknowone previously approved these changes Jul 25, 2025

View reviewed changes

youknowone reviewed Jul 25, 2025

View reviewed changes

Reintroduce expected failure in test_statistics.TestNormalDict.test_s…

3b14ad8

…lots

jackoconnordev requested a review from youknowone July 25, 2025 13:08

youknowone approved these changes Jul 25, 2025

View reviewed changes

youknowone merged commit 4079776 into RustPython:main Jul 25, 2025
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add `kde` function and tests to RustPython statistics module #6030

Add `kde` function and tests to RustPython statistics module #6030

Uh oh!

jackoconnordev commented Jul 24, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jul 24, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

Uh oh!

Uh oh!

jackoconnordev commented Jul 24, 2025

Uh oh!

ShaharNaveh commented Jul 24, 2025 •

edited

Loading

Uh oh!

jackoconnordev commented Jul 24, 2025

Uh oh!

youknowone left a comment

Uh oh!

youknowone left a comment

Uh oh!

jackoconnordev commented Jul 25, 2025

Uh oh!

jackoconnordev commented Jul 25, 2025

Uh oh!

youknowone commented Jul 25, 2025

Uh oh!

youknowone left a comment

Uh oh!

Uh oh!

Uh oh!

Add kde function and tests to RustPython statistics module #6030

Add kde function and tests to RustPython statistics module #6030

Uh oh!

Conversation

jackoconnordev commented Jul 24, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Using kde in the interpreter

whats_left.py diff

Example output showing original numbers are too slow to be appropriate

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

Uh oh!

Uh oh!

jackoconnordev commented Jul 24, 2025

Uh oh!

ShaharNaveh commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jackoconnordev commented Jul 24, 2025

Uh oh!

youknowone left a comment

Choose a reason for hiding this comment

Uh oh!

youknowone left a comment

Choose a reason for hiding this comment

Uh oh!

jackoconnordev commented Jul 25, 2025

Uh oh!

jackoconnordev commented Jul 25, 2025

Uh oh!

youknowone commented Jul 25, 2025

Uh oh!

youknowone left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Add `kde` function and tests to RustPython statistics module #6030

Add `kde` function and tests to RustPython statistics module #6030

jackoconnordev commented Jul 24, 2025 •

edited by coderabbitai bot

Loading

Using `kde` in the interpreter

coderabbitai bot commented Jul 24, 2025 •

edited

Loading

ShaharNaveh commented Jul 24, 2025 •

edited

Loading