Statistics inv_cdf sync with corresponding random module normal distributions #95265

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

rhettinger merged 2 commits into python:main from rhettinger:statistics_invcdf_sync

Jul 26, 2022

Contributor

rhettinger commented Jul 26, 2022 •

edited

Loading

Remove the inconsistent restriction that inv_cdf() is undefined when sigma is zero.

Aligns the C _normal_dist_inv_cdf() function with its pure python equivalent.
Restores the invariant that NormalDist(mu, sigma).inv_cdf(p) is equivalent to `NormalDist(0.0, 1.0)inv_cdf(p) * sigma + mi``. In general, operations on NormalDist should always match the rescaled and reentered results for the same operation on the standard normal distribution.
Restores alignment with random.gauss(mu, sigma) and random.normalvariate(mu, sigma) both. of which are equivalent to sampling from NormalDist(mu, sigma).inv_cdf(random()). The two functions in the random module happy accept sigma=0 and give a well-defined result.
This also lets the function gently handle a sigma getting smaller, eventually becoming zero. As sigma decrease, NormalDist(mu, sigma).inv_cdf(p) forms a tighter and tighter internal around mu and becoming exactly mu in the limit. For example, NormalDist(100, 1E-300).inv_cdf(0.3) cleanly evaluates to 100.0but withsigma=1e-500`` the function previously would raised an unexpected error.
The only opposing idea is that inv_cdf() means to give the inverse of cdf(), but the cdf isn't defined with sigma=0. This is fine though. all supported inputs to cdf() are still invertible. There is just a relaxation of a restriction beyond the supported range which makes inv_cdf() obey other invariants and handle edge cases cleanly.
The edit also gives a very small performance improvement.

rhettinger added 2 commits

July 19, 2022 15:20


          Sync C code with Python equivalent

baca805


          Let inv_cdf() support the case with a zero stdev

9bca4fb

rhettinger added type-bug skip issue skip news labels

bedevere-bot added the awaiting core review label

rhettinger changed the title ~~Statistics invcdf sync~~ Statistics inv_cdf sync with corresponding random module normal distributions

rhettinger merged commit 4395ff1 into python:main

bedevere-bot removed the awaiting core review label

jackoconnordev mentioned this pull request

Add kde function and tests to RustPython statistics module RustPython/RustPython#6030

Merged

jackoconnordev added a commit to jackoconnordev/RustPython that referenced this pull request


          Sync Rust normal_dist_inv_cdf with Python equivalent

3ef5264

See python/cpython#95265.

To quote:
> Restores alignment with random.gauss(mu, sigma) and
random.normalvariate(mu, sigma) both. of which are equivalent to
sampling from NormalDist(mu, sigma).inv_cdf(random()). The two functions
in the random module happy accept sigma=0 and give a well-defined
result.

> This also lets the function gently handle a sigma getting smaller,
eventually becoming zero. As sigma decrease, NormalDist(mu,
sigma).inv_cdf(p) forms a tighter and tighter internal around mu and
becoming exactly mu in the limit. For example, NormalDist(100,
1E-300).inv_cdf(0.3) cleanly evaluates to 100.0but withsigma=1e-500``
the function previously would raised an unexpected error.

youknowone pushed a commit to RustPython/RustPython that referenced this pull request


          Add kde function and tests to RustPython statistics module (#6030)

* Copy CPython 3.13 statistics module into RustPython

* Adjust CPython "magic constants" in KDE tests

## test_kde

I'm not too sure why but this one takes a few seconds to run the second
for loop which calculates the cumulative distribution and does a rough
calculation of the area under the curve.

## test_kde_random

I have a lower bound for RustPython to sort a random list of 1_000_000
numbers on my laptop of > 1 hour. By dropping n to 30_000 sort will not
take an egregious amount of time to run. It is then necessary to lower
the tolerance for the math.isclose check, or the computed values may
**randomly** fail due to the higher variance caused by the smaller
sample size.

* Reintroduce expected failure in test_statistics.TestNormalDict.test_slots

* Sync Rust `normal_dist_inv_cdf` with Python equivalent

See python/cpython#95265.

To quote:
> Restores alignment with random.gauss(mu, sigma) and
random.normalvariate(mu, sigma) both. of which are equivalent to
sampling from NormalDist(mu, sigma).inv_cdf(random()). The two functions
in the random module happy accept sigma=0 and give a well-defined
result.

> This also lets the function gently handle a sigma getting smaller,
eventually becoming zero. As sigma decrease, NormalDist(mu,
sigma).inv_cdf(p) forms a tighter and tighter internal around mu and
becoming exactly mu in the limit. For example, NormalDist(100,
1E-300).inv_cdf(0.3) cleanly evaluates to 100.0but withsigma=1e-500``
the function previously would raised an unexpected error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip issue skip news type-bug