Skip to content

Add stricter gradient check for log marginal likelihood in Gaussian Processes #31543

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

snath-xoc
Copy link
Contributor

@snath-xoc snath-xoc commented Jun 13, 2025

First step in fixing the error in the log-marginal likelihood gradient calculations in Gaussian Processes as noticed in #31366. Also related to #31289.

What does this implement/fix? Explain your changes.

Implements a stricter test for test_lml_gradient by replacing the manual gradient calculation usingscip.optimize.approx_fprime with scipy.differentiate.derivative and calculating the gradient over several different length_scales

@conradstevens and @lorentzenchr any suggestions are welcome (cc @ogrisel and @antoinebaker)

TO DO (perhaps within this PR or separately):

  • Implement chain rule calculation for theta under GaussianProcessRegressor
  • Implement chain rule calculation for the Max Absolute Posteriori (b) in _BinaryGaussianProcessClassifierLaplace

Copy link

github-actions bot commented Jun 13, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: da433c0. Link to the linter CI: here

@snath-xoc
Copy link
Contributor Author

Somehow some commits from other branches got mixed up in this PR hence the weird number of commit messages (the files should be restored now)

@conradstevens
Copy link
Contributor

Are we going to go with scipy.differentiate as suggested by @lorentzenchr?

@snath-xoc
Copy link
Contributor Author

snath-xoc commented Jun 19, 2025

@conradstevens yes, your initial manual_grad implementation is similar to approx_fprime and derivative seems to yield similar results. I think they could be used interchangably, derivative is roughly based off taylor series approximation and seems to be more correct. The main addition is that we check it over many different length-scales which is different from the previous implementation.

@lorentzenchr
Copy link
Member

Could you now mark the failing tests with pytest‘s xfail?
Then another PR can fix the gradient.

@ogrisel
Copy link
Member

ogrisel commented Jul 28, 2025

Let me merge main into this branch to get up-to-date CI reports.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, here is a first pass of reviews.

@@ -105,17 +105,37 @@ def test_converged_to_local_maximum(kernel):
)


@pytest.mark.parametrize("kernel", kernels)
@pytest.mark.parametrize("kernel", non_fixed_kernels[:-1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please include the product kernel in this list?

The code could be updated to find the name of the relevant kernel parameter using:

length_scale_param_name = next(
    name for name in k.get_params() if name.endswith("length_scale")
)

and then the for loop can be updated to use kernel.set_params(**{length_scale_param_name: length_scale}) instead of kernel.length_scale = length_scale.

1
][0]

lml_gradient_manual = derivative(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to keep using the lml_gradient_approx name for this variable, as scipy.differentiate.derivative is doing an approximate (numerical) computation rather than a manually implemented (symbolically derived) computation.

lml_gradient_approx = approx_fprime(
kernel.theta, lambda theta: gpc.log_marginal_likelihood(theta, False), 1e-10
)
length_scales = np.linspace(1, 25, 1_000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather use np.logspace(-3, 3, 100):

  • reducing the number of steps should make the test run faster;
  • using a logspace on this range, typically used to configure such parameter bounds, should still check for most interesting values of that parameter.

@@ -140,17 +140,36 @@ def test_solution_inside_bounds(kernel):
assert_array_less(gpr.kernel_.theta, bounds[:, 1] + tiny)


@pytest.mark.parametrize("kernel", kernels)
@pytest.mark.parametrize("kernel", non_fixed_kernels[:2])
def test_lml_gradient(kernel):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_lml_gradient(kernel):
def test_lml_gradient(kernel):
# Clone the kernel object prior to mutating it to avoid any side effects between
# GP tests:
kernel = clone(kernel)

@@ -105,17 +105,37 @@ def test_converged_to_local_maximum(kernel):
)


@pytest.mark.parametrize("kernel", kernels)
@pytest.mark.parametrize("kernel", non_fixed_kernels[:-1])
def test_lml_gradient(kernel):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should always clone the kernel argument passed to test functions: otherwise, mutation of this object can cause other unrelated tests to fail (as is the case in this PR).

Suggested change
def test_lml_gradient(kernel):
def test_lml_gradient(kernel):
# Clone the kernel object prior to mutating it to avoid any side effects between
# GP tests:
kernel = clone(kernel)

Note that adding from sklearn.base import clone is also needed.

Note: this change might also be needed to get a thread-safe test suite (as being investigated in the draft #30041).

length_scales = np.linspace(1, 25, 1_000)

def evaluate_grad_at_length_scales(length_scales):
result = np.zeros_like(length_scales)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shape of the result array seems invalid when len(kernel.theta) != 1.

Instead, you could store the results in a Python list and then stack the results.

@@ -140,17 +140,36 @@ def test_solution_inside_bounds(kernel):
assert_array_less(gpr.kernel_.theta, bounds[:, 1] + tiny)


@pytest.mark.parametrize("kernel", kernels)
@pytest.mark.parametrize("kernel", non_fixed_kernels[:2])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please test against all non_fixed_kernels or add a comment to explain why some kernels are left out of this test (if there is a good reason for that).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants