Skip to content

Statistics inv_cdf sync with corresponding random module normal distributions #95265

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 26, 2022

Conversation

rhettinger
Copy link
Contributor

@rhettinger rhettinger commented Jul 26, 2022

Remove the inconsistent restriction that inv_cdf() is undefined when sigma is zero.

  • Aligns the C _normal_dist_inv_cdf() function with its pure python equivalent.

  • Restores the invariant that NormalDist(mu, sigma).inv_cdf(p) is equivalent to `NormalDist(0.0, 1.0)inv_cdf(p) * sigma + mi``. In general, operations on NormalDist should always match the rescaled and reentered results for the same operation on the standard normal distribution.

  • Restores alignment with random.gauss(mu, sigma) and random.normalvariate(mu, sigma) both. of which are equivalent to sampling from NormalDist(mu, sigma).inv_cdf(random()). The two functions in the random module happy accept sigma=0 and give a well-defined result.

  • This also lets the function gently handle a sigma getting smaller, eventually becoming zero. As sigma decrease, NormalDist(mu, sigma).inv_cdf(p) forms a tighter and tighter internal around mu and becoming exactly mu in the limit. For example, NormalDist(100, 1E-300).inv_cdf(0.3) cleanly evaluates to 100.0but withsigma=1e-500`` the function previously would raised an unexpected error.

  • The only opposing idea is that inv_cdf() means to give the inverse of cdf(), but the cdf isn't defined with sigma=0. This is fine though. all supported inputs to cdf() are still invertible. There is just a relaxation of a restriction beyond the supported range which makes inv_cdf() obey other invariants and handle edge cases cleanly.

  • The edit also gives a very small performance improvement.

@rhettinger rhettinger added type-bug An unexpected behavior, bug, or error skip issue skip news labels Jul 26, 2022
@rhettinger rhettinger changed the title Statistics invcdf sync Statistics inv_cdf sync with corresponding random module normal distributions Jul 26, 2022
@rhettinger rhettinger merged commit 4395ff1 into python:main Jul 26, 2022
jackoconnordev added a commit to jackoconnordev/RustPython that referenced this pull request Jul 25, 2025
See python/cpython#95265.

To quote:
> Restores alignment with random.gauss(mu, sigma) and
random.normalvariate(mu, sigma) both. of which are equivalent to
sampling from NormalDist(mu, sigma).inv_cdf(random()). The two functions
in the random module happy accept sigma=0 and give a well-defined
result.

> This also lets the function gently handle a sigma getting smaller,
eventually becoming zero. As sigma decrease, NormalDist(mu,
sigma).inv_cdf(p) forms a tighter and tighter internal around mu and
becoming exactly mu in the limit. For example, NormalDist(100,
1E-300).inv_cdf(0.3) cleanly evaluates to 100.0but withsigma=1e-500``
the function previously would raised an unexpected error.
youknowone pushed a commit to RustPython/RustPython that referenced this pull request Jul 25, 2025
* Copy CPython 3.13 statistics module into RustPython

* Adjust CPython "magic constants" in KDE tests

## test_kde

I'm not too sure why but this one takes a few seconds to run the second
for loop which calculates the cumulative distribution and does a rough
calculation of the area under the curve.

## test_kde_random

I have a lower bound for RustPython to sort a random list of 1_000_000
numbers on my laptop of > 1 hour. By dropping n to 30_000 sort will not
take an egregious amount of time to run. It is then necessary to lower
the tolerance for the math.isclose check, or the computed values may
**randomly** fail due to the higher variance caused by the smaller
sample size.

* Reintroduce expected failure in test_statistics.TestNormalDict.test_slots

* Sync Rust `normal_dist_inv_cdf` with Python equivalent

See python/cpython#95265.

To quote:
> Restores alignment with random.gauss(mu, sigma) and
random.normalvariate(mu, sigma) both. of which are equivalent to
sampling from NormalDist(mu, sigma).inv_cdf(random()). The two functions
in the random module happy accept sigma=0 and give a well-defined
result.

> This also lets the function gently handle a sigma getting smaller,
eventually becoming zero. As sigma decrease, NormalDist(mu,
sigma).inv_cdf(p) forms a tighter and tighter internal around mu and
becoming exactly mu in the limit. For example, NormalDist(100,
1E-300).inv_cdf(0.3) cleanly evaluates to 100.0but withsigma=1e-500``
the function previously would raised an unexpected error.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip issue skip news type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants