Adapting step_size and sag updates for sample_weight #31837

antoinebaker · 2025-07-25T13:59:11Z

Reference Issues/PRs

Helper for PR #31675 to fix sample_weight support in SAG(A) solvers #31536.

What does this implement/fix? Explain your changes.

In test_sag.py several fixes are made to properly handle sample_weight.

get_step_size: the formula Lipschitz smoothness constant are corrected to take into account sample_weight
sag: the updates for the weights and intercept are corrected
sag: the number of seen elements (which converges to n_samples) is replaced by the weighted sum of seen elements (which converges to sample_weight.sum())

A true_weights argument was added to sag for visualisation purpose (plot the convergence towards the true minima). This is useful for the notebook below but can be removed in the final PR. A tol argument was also added to use the same stopping criterion as _sag_fast.pyx.

cc @snath-xoc

github-actions · 2025-07-25T14:00:02Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: d117189. Link to the linter CI: here}

antoinebaker · 2025-07-25T14:17:50Z

See https://gist.github.com/antoinebaker/0fc40c94952a2da371bedb6caca53c10 : now both sag and saga converges towards the true minima, for the weighted and repeated datasets. The weighted version no longer has convergence issue.

We illustrate the convergence on the blob dataset from test_classifier_matching.

We also illustrate with the dataset from our common test check_sample_weight_equivalence, which should pass after a similar fix in _sag_fast,pyx. Note that the sag and saga solvers, while being stochastic, actually yield a deterministic output when they have converged (the true minima), and it's enough to test them with the deterministic repeated/weighted equivalence test.

ogrisel

Could you please add an xfailing test that checks that both the _sag_fast.pyx implementation and the (now fixed) Python implementation converge to the same solution with sample_weight, both when alpha is small-ish and larg-ish?

I am wondering if we shouldn't also adapt the content of https://gist.github.com/antoinebaker/0fc40c94952a2da371bedb6caca53c10 to turn it into a test.

I agree that once the Cython version is fixed, the existing common test can serve this purpose. However, it would only indirectly check the correctness of the reference Python version.

ogrisel · 2025-07-28T12:11:20Z

sklearn/linear_model/tests/test_sag.py

+        if (max_weight != 0 and max_change / max_weight <= tol) or (
+            max_weight == 0 and max_change == 0
+        ):
+            print(f"sag convergence after {epoch + 1} epochs")


Rather than printing things, I think the number of iteration before convergence should be reported in the results of this test helper function.

We could also rename n_iter to max_iter now that this function can check for early convergence.

Then we could update the existing tests to check that the effective number of iterations is always strictly lower than max_iter whenever we call it with a strictly positive tol value.

fix convergence

d117189

github-actions bot added the module:linear_model label Jul 25, 2025

ogrisel mentioned this pull request Jul 28, 2025

Fix sample weight handling in SAG(A) #31675

Open

1 task

ogrisel added the No Changelog Needed label Jul 28, 2025

ogrisel reviewed Jul 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Adapting step_size and sag updates for sample_weight #31837

Adapting step_size and sag updates for sample_weight #31837

Uh oh!

antoinebaker commented Jul 25, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 25, 2025

Uh oh!

antoinebaker commented Jul 25, 2025

Uh oh!

ogrisel left a comment •

edited

Loading

Uh oh!

ogrisel Jul 28, 2025

Uh oh!

ogrisel Jul 28, 2025

Uh oh!

Uh oh!

Uh oh!

Adapting step_size and sag updates for sample_weight #31837

Are you sure you want to change the base?

Adapting step_size and sag updates for sample_weight #31837

Uh oh!

Conversation

antoinebaker commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

github-actions bot commented Jul 25, 2025

✔️ Linting Passed

Uh oh!

antoinebaker commented Jul 25, 2025

Uh oh!

ogrisel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

ogrisel Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

antoinebaker commented Jul 25, 2025 •

edited

Loading

ogrisel left a comment •

edited

Loading