-
-
Notifications
You must be signed in to change notification settings - Fork 26.1k
Add stricter gradient check for log marginal likelihood in Gaussian Processes #31543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Somehow some commits from other branches got mixed up in this PR hence the weird number of commit messages (the files should be restored now) |
Are we going to go with scipy.differentiate as suggested by @lorentzenchr? |
@conradstevens yes, your initial manual_grad implementation is similar to |
Could you now mark the failing tests with pytest‘s xfail? |
Let me merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, here is a first pass of reviews.
@@ -105,17 +105,37 @@ def test_converged_to_local_maximum(kernel): | |||
) | |||
|
|||
|
|||
@pytest.mark.parametrize("kernel", kernels) | |||
@pytest.mark.parametrize("kernel", non_fixed_kernels[:-1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please include the product kernel in this list?
The code could be updated to find the name of the relevant kernel parameter using:
length_scale_param_name = next(
name for name in k.get_params() if name.endswith("length_scale")
)
and then the for loop can be updated to use kernel.set_params(**{length_scale_param_name: length_scale})
instead of kernel.length_scale = length_scale
.
1 | ||
][0] | ||
|
||
lml_gradient_manual = derivative( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to keep using the lml_gradient_approx
name for this variable, as scipy.differentiate.derivative
is doing an approximate (numerical) computation rather than a manually implemented (symbolically derived) computation.
lml_gradient_approx = approx_fprime( | ||
kernel.theta, lambda theta: gpc.log_marginal_likelihood(theta, False), 1e-10 | ||
) | ||
length_scales = np.linspace(1, 25, 1_000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather use np.logspace(-3, 3, 100)
:
- reducing the number of steps should make the test run faster;
- using a logspace on this range, typically used to configure such parameter bounds, should still check for most interesting values of that parameter.
@@ -140,17 +140,36 @@ def test_solution_inside_bounds(kernel): | |||
assert_array_less(gpr.kernel_.theta, bounds[:, 1] + tiny) | |||
|
|||
|
|||
@pytest.mark.parametrize("kernel", kernels) | |||
@pytest.mark.parametrize("kernel", non_fixed_kernels[:2]) | |||
def test_lml_gradient(kernel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def test_lml_gradient(kernel): | |
def test_lml_gradient(kernel): | |
# Clone the kernel object prior to mutating it to avoid any side effects between | |
# GP tests: | |
kernel = clone(kernel) |
@@ -105,17 +105,37 @@ def test_converged_to_local_maximum(kernel): | |||
) | |||
|
|||
|
|||
@pytest.mark.parametrize("kernel", kernels) | |||
@pytest.mark.parametrize("kernel", non_fixed_kernels[:-1]) | |||
def test_lml_gradient(kernel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should always clone the kernel
argument passed to test functions: otherwise, mutation of this object can cause other unrelated tests to fail (as is the case in this PR).
def test_lml_gradient(kernel): | |
def test_lml_gradient(kernel): | |
# Clone the kernel object prior to mutating it to avoid any side effects between | |
# GP tests: | |
kernel = clone(kernel) |
Note that adding from sklearn.base import clone
is also needed.
Note: this change might also be needed to get a thread-safe test suite (as being investigated in the draft #30041).
length_scales = np.linspace(1, 25, 1_000) | ||
|
||
def evaluate_grad_at_length_scales(length_scales): | ||
result = np.zeros_like(length_scales) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The shape of the result array seems invalid when len(kernel.theta) != 1
.
Instead, you could store the results in a Python list and then stack the results.
@@ -140,17 +140,36 @@ def test_solution_inside_bounds(kernel): | |||
assert_array_less(gpr.kernel_.theta, bounds[:, 1] + tiny) | |||
|
|||
|
|||
@pytest.mark.parametrize("kernel", kernels) | |||
@pytest.mark.parametrize("kernel", non_fixed_kernels[:2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please test against all non_fixed_kernels
or add a comment to explain why some kernels are left out of this test (if there is a good reason for that).
First step in fixing the error in the log-marginal likelihood gradient calculations in Gaussian Processes as noticed in #31366. Also related to #31289.
What does this implement/fix? Explain your changes.
Implements a stricter test for
test_lml_gradient
by replacing the manual gradient calculation usingscip.optimize.approx_fprime
withscipy.differentiate.derivative
and calculating the gradient over several different length_scales@conradstevens and @lorentzenchr any suggestions are welcome (cc @ogrisel and @antoinebaker)
TO DO (perhaps within this PR or separately):
theta
underGaussianProcessRegressor
b
) in_BinaryGaussianProcessClassifierLaplace